2025-12-04T09:32:17.1571802Z Current runner version: '2.330.0'
2025-12-04T09:32:17.1579224Z Runner name: 'i-03bbda7791efb68ed'
2025-12-04T09:32:17.1580145Z Runner group name: 'default'
2025-12-04T09:32:17.1581201Z Machine name: 'ip-10-0-76-64'
2025-12-04T09:32:17.1584412Z ##[group]GITHUB_TOKEN Permissions
2025-12-04T09:32:17.1586989Z Contents: read
2025-12-04T09:32:17.1587728Z Metadata: read
2025-12-04T09:32:17.1588348Z ##[endgroup]
2025-12-04T09:32:17.1590877Z Secret source: Actions
2025-12-04T09:32:17.1591837Z Prepare workflow directory
2025-12-04T09:32:17.2182965Z Prepare all required actions
2025-12-04T09:32:17.2231197Z Getting action download info
2025-12-04T09:32:17.5706473Z Download action repository 'pytorch/test-infra@main' (SHA:39aa74d619174326f4e2fb0e216151c2f29d9ffd)
2025-12-04T09:32:19.8499724Z Download action repository 'pytorch/pytorch@main' (SHA:7716da9fb23f27a65b41f9f016a2afadf281c18f)
2025-12-04T09:32:34.8048555Z Download action repository 'actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065' (SHA:a26af69be951a213d495a4c3e4e4022e16d87065)
2025-12-04T09:32:35.1546803Z Download action repository 'aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722' (SHA:ececac1a45f3b08a01d2dd070d28d111c5fe6722)
2025-12-04T09:32:35.4135266Z Download action repository 'aws-actions/amazon-ecr-login@062b18b96a7aff071d4dc91bc00c4c1a7945b076' (SHA:062b18b96a7aff071d4dc91bc00c4c1a7945b076)
2025-12-04T09:32:35.5943042Z Download action repository 'seemethere/download-artifact-s3@1da556a7aa0a088e3153970611f6c432d58e80e6' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:32:35.8374692Z Download action repository 'seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T09:32:36.1592452Z Getting action download info
2025-12-04T09:32:36.2871116Z Download action repository 'actions/checkout@v4' (SHA:34e114876b0b11c390a56381ad16ebd13914f8d5)
2025-12-04T09:32:36.5900090Z Getting action download info
2025-12-04T09:32:36.7171641Z Download action repository 'nick-fields/retry@v3.0.0' (SHA:7152eba30c6575329ac0576536151aca5a72780e)
2025-12-04T09:32:36.9576703Z Getting action download info
2025-12-04T09:32:37.0751538Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2025-12-04T09:32:37.2915476Z Getting action download info
2025-12-04T09:32:37.5288953Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/main (ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32)
2025-12-04T09:32:37.5293713Z ##[group] Inputs
2025-12-04T09:32:37.5294164Z   build-environment: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:32:37.5301892Z   test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:32:37.5310111Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:32:37.5311124Z   sync-tag: 
2025-12-04T09:32:37.5312091Z   timeout-minutes: 240
2025-12-04T09:32:37.5312380Z   use-gha: 
2025-12-04T09:32:37.5312635Z   dashboard-tag: 
2025-12-04T09:32:37.5312918Z   s3-bucket: gha-artifacts
2025-12-04T09:32:37.5313221Z   aws-role-to-assume: 
2025-12-04T09:32:37.5313882Z   disable-monitor: false
2025-12-04T09:32:37.5314240Z   monitor-log-interval: 5
2025-12-04T09:32:37.5314592Z   monitor-data-collect-interval: 1
2025-12-04T09:32:37.5314980Z ##[endgroup]
2025-12-04T09:32:37.5315770Z Complete job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:32:37.5897954Z A job started hook has been configured by the self-hosted runner administrator
2025-12-04T09:32:37.6013766Z ##[group]Run '/home/ec2-user/runner-scripts/before_job.sh'
2025-12-04T09:32:37.6024261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:37.6025026Z ##[endgroup]
2025-12-04T09:32:39.1448433Z Runner Type: linux.g4dn.4xlarge.nvidia.gpu
2025-12-04T09:32:39.1449080Z Instance Type: g4dn.4xlarge
2025-12-04T09:32:39.1449393Z AMI Name: unknown
2025-12-04T09:32:39.1491392Z AMI ID: ami-08982f1c5bf93d976
2025-12-04T09:32:45.3827271Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2025-12-04T09:32:45.3827792Z with:
2025-12-04T09:32:45.3828431Z   github-secret: ***
2025-12-04T09:32:45.3829289Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2025-12-04T09:32:45.3830252Z   activate-with-label: false
2025-12-04T09:32:45.3830572Z   label: with-ssh
2025-12-04T09:32:45.3830860Z   remove-existing-keys: true
2025-12-04T09:32:45.3831185Z   fail-silently: true
2025-12-04T09:32:45.3831452Z env:
2025-12-04T09:32:45.3831700Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.3832016Z ##[endgroup]
2025-12-04T09:32:45.5466988Z Please see https://github.com/pytorch/pytorch/wiki/Debugging-using-with-ssh-for-Github-Actions for more info.
2025-12-04T09:32:45.5468752Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys
2025-12-04T09:32:45.5846986Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@main
2025-12-04T09:32:45.5847496Z with:
2025-12-04T09:32:45.5847750Z   no-sudo: true
2025-12-04T09:32:45.5848027Z   submodules: recursive
2025-12-04T09:32:45.5848328Z   fetch-depth: 0
2025-12-04T09:32:45.5848614Z env:
2025-12-04T09:32:45.5848860Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5849153Z ##[endgroup]
2025-12-04T09:32:45.5934616Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:32:45.5935773Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:32:45.5946549Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:45.5947013Z env:
2025-12-04T09:32:45.5947283Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.5947629Z ##[endgroup]
2025-12-04T09:32:45.6037671Z ##[group]Run # Use all available CPUs for fetching
2025-12-04T09:32:45.6038206Z [36;1m# Use all available CPUs for fetching[0m
2025-12-04T09:32:45.6038619Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:32:45.6039006Z [36;1mgit config --global fetch.parallel 0[0m
2025-12-04T09:32:45.6039686Z [36;1mgit config --global submodule.fetchJobs 0[0m
2025-12-04T09:32:45.6040090Z [36;1m[0m
2025-12-04T09:32:45.6040503Z [36;1m# Clean workspace. The default checkout action should also do this, but[0m
2025-12-04T09:32:45.6041071Z [36;1m# do it here as well just in case[0m
2025-12-04T09:32:45.6041448Z [36;1mif [[ -d .git ]]; then[0m
2025-12-04T09:32:45.6041809Z [36;1m  if [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:32:45.6042184Z [36;1m    sudo git clean -ffdx[0m
2025-12-04T09:32:45.6042613Z [36;1m  else[0m
2025-12-04T09:32:45.6042895Z [36;1m    git clean -ffdx[0m
2025-12-04T09:32:45.6043212Z [36;1m  fi[0m
2025-12-04T09:32:45.6043467Z [36;1mfi[0m
2025-12-04T09:32:45.6050120Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:32:45.6050570Z env:
2025-12-04T09:32:45.6050919Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.6051240Z   NO_SUDO: true
2025-12-04T09:32:45.6051515Z ##[endgroup]
2025-12-04T09:32:45.6184267Z ##[group]Run actions/checkout@v4
2025-12-04T09:32:45.6184653Z with:
2025-12-04T09:32:45.6184944Z   ref: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:32:45.6185328Z   fetch-depth: 0
2025-12-04T09:32:45.6185606Z   submodules: recursive
2025-12-04T09:32:45.6185909Z   show-progress: false
2025-12-04T09:32:45.6186223Z   repository: pytorch/pytorch
2025-12-04T09:32:45.6186692Z   token: ***
2025-12-04T09:32:45.6186952Z   ssh-strict: true
2025-12-04T09:32:45.6187229Z   ssh-user: git
2025-12-04T09:32:45.6187501Z   persist-credentials: true
2025-12-04T09:32:45.6187820Z   clean: true
2025-12-04T09:32:45.6188122Z   sparse-checkout-cone-mode: true
2025-12-04T09:32:45.6188457Z   fetch-tags: false
2025-12-04T09:32:45.6188730Z   lfs: false
2025-12-04T09:32:45.6189003Z   set-safe-directory: true
2025-12-04T09:32:45.6189303Z env:
2025-12-04T09:32:45.6189551Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:32:45.6189849Z ##[endgroup]
2025-12-04T09:32:45.7433515Z Syncing repository: pytorch/pytorch
2025-12-04T09:32:45.7435128Z ##[group]Getting Git version info
2025-12-04T09:32:45.7435740Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:32:45.7436542Z [command]/usr/bin/git version
2025-12-04T09:32:45.7634283Z git version 2.50.1
2025-12-04T09:32:45.7663559Z ##[endgroup]
2025-12-04T09:32:45.7674888Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/dbedb87e-7286-4c3b-9e34-21fce791ca44/.gitconfig'
2025-12-04T09:32:45.7694585Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/dbedb87e-7286-4c3b-9e34-21fce791ca44' before making global git config changes
2025-12-04T09:32:45.7695805Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T09:32:45.7700257Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:32:45.7746617Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2025-12-04T09:32:45.7749932Z ##[group]Initializing the repository
2025-12-04T09:32:45.7754496Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:32:45.7817587Z hint: Using 'master' as the name for the initial branch. This default branch name
2025-12-04T09:32:45.7818312Z hint: is subject to change. To configure the initial branch name to use in all
2025-12-04T09:32:45.7818995Z hint: of your new repositories, which will suppress this warning, call:
2025-12-04T09:32:45.7819476Z hint:
2025-12-04T09:32:45.7819822Z hint: 	git config --global init.defaultBranch <name>
2025-12-04T09:32:45.7820238Z hint:
2025-12-04T09:32:45.7820613Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2025-12-04T09:32:45.7821312Z hint: 'development'. The just-created branch can be renamed via this command:
2025-12-04T09:32:45.7821852Z hint:
2025-12-04T09:32:45.7822093Z hint: 	git branch -m <name>
2025-12-04T09:32:45.7822405Z hint:
2025-12-04T09:32:45.7822836Z hint: Disable this message with "git config set advice.defaultBranchName false"
2025-12-04T09:32:45.7826976Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2025-12-04T09:32:45.7836956Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2025-12-04T09:32:45.7877034Z ##[endgroup]
2025-12-04T09:32:45.7877549Z ##[group]Disabling automatic garbage collection
2025-12-04T09:32:45.7881330Z [command]/usr/bin/git config --local gc.auto 0
2025-12-04T09:32:45.7910139Z ##[endgroup]
2025-12-04T09:32:45.7910703Z ##[group]Setting up auth
2025-12-04T09:32:45.7917074Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T09:32:45.7946009Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T09:32:45.8314116Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T09:32:45.8344416Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T09:32:45.8666767Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T09:32:45.8697670Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T09:32:45.9008780Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:32:45.9064590Z ##[endgroup]
2025-12-04T09:32:45.9065134Z ##[group]Fetching the repository
2025-12-04T09:32:45.9074250Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2025-12-04T09:33:41.3065034Z From https://github.com/pytorch/pytorch
2025-12-04T09:33:41.3065625Z  * [new branch]              2.6.0.dev20241004+          -> origin/2.6.0.dev20241004+
2025-12-04T09:33:41.3066357Z  * [new branch]              2.9.1                       -> origin/2.9.1
2025-12-04T09:33:41.3067055Z  * [new branch]              AaronWang04_addmmfusion_perftest -> origin/AaronWang04_addmmfusion_perftest
2025-12-04T09:33:41.3067828Z  * [new branch]              Flamefire-patch-1           -> origin/Flamefire-patch-1
2025-12-04T09:33:41.3068838Z  * [new branch]              HDCharles-2.6.0-release-notes -> origin/HDCharles-2.6.0-release-notes
2025-12-04T09:33:41.3070155Z  * [new branch]              HOPrintFunc                 -> origin/HOPrintFunc
2025-12-04T09:33:41.3072678Z  * [new branch]              IvanKobzarev/stack/1        -> origin/IvanKobzarev/stack/1
2025-12-04T09:33:41.3074962Z  * [new branch]              NicoshevSVE128              -> origin/NicoshevSVE128
2025-12-04T09:33:41.3076049Z  * [new branch]              PR-AOTInductorNoneBug       -> origin/PR-AOTInductorNoneBug
2025-12-04T09:33:41.3077620Z  * [new branch]              PR-AOTInductorNoneBugFix    -> origin/PR-AOTInductorNoneBugFix
2025-12-04T09:33:41.3078743Z  * [new branch]              PR-FixConfigsIssue          -> origin/PR-FixConfigsIssue
2025-12-04T09:33:41.3080004Z  * [new branch]              PR-NoneBugFix-viable        -> origin/PR-NoneBugFix-viable
2025-12-04T09:33:41.3081338Z  * [new branch]              PR-ResetToZero              -> origin/PR-ResetToZero
2025-12-04T09:33:41.3083085Z  * [new branch]              Update-Flash-Packaging      -> origin/Update-Flash-Packaging
2025-12-04T09:33:41.3084190Z  * [new branch]              VLA_exp                     -> origin/VLA_exp
2025-12-04T09:33:41.3085861Z  * [new branch]              activation_bench            -> origin/activation_bench
2025-12-04T09:33:41.3087801Z  * [new branch]              addmm-heuristic             -> origin/addmm-heuristic
2025-12-04T09:33:41.3089629Z  * [new branch]              adi/onednn_aarch64          -> origin/adi/onednn_aarch64
2025-12-04T09:33:41.3090833Z  * [new branch]              adi/test                    -> origin/adi/test
2025-12-04T09:33:41.3092179Z  * [new branch]              adi/test_bgemm              -> origin/adi/test_bgemm
2025-12-04T09:33:41.3093599Z  * [new branch]              adi/test_m8g                -> origin/adi/test_m8g
2025-12-04T09:33:41.3094862Z  * [new branch]              adi/test_onednn             -> origin/adi/test_onednn
2025-12-04T09:33:41.3096131Z  * [new branch]              adi/test_onednn_v3.9        -> origin/adi/test_onednn_v3.9
2025-12-04T09:33:41.3097580Z  * [new branch]              adi/test_presve_change      -> origin/adi/test_presve_change
2025-12-04T09:33:41.3098687Z  * [new branch]              adi/test_timm               -> origin/adi/test_timm
2025-12-04T09:33:41.3100474Z  * [new branch]              adi/testpresve_change       -> origin/adi/testpresve_change
2025-12-04T09:33:41.3103178Z  * [new branch]              aditew01/test/vec_bf16      -> origin/aditew01/test/vec_bf16
2025-12-04T09:33:41.3104401Z  * [new branch]              ah-globalfeedback-hook      -> origin/ah-globalfeedback-hook
2025-12-04T09:33:41.3106088Z  * [new branch]              albanD-patch-1              -> origin/albanD-patch-1
2025-12-04T09:33:41.3107196Z  * [new branch]              also-surround-shimh         -> origin/also-surround-shimh
2025-12-04T09:33:41.3109094Z  * [new branch]              angelayi/aot_compile        -> origin/angelayi/aot_compile
2025-12-04T09:33:41.3110388Z  * [new branch]              angelayi/aoti_additional_files -> origin/angelayi/aoti_additional_files
2025-12-04T09:33:41.3111631Z  * [new branch]              angelayi/benchmark          -> origin/angelayi/benchmark
2025-12-04T09:33:41.3112964Z  * [new branch]              angelayi/change_pytree_serialization -> origin/angelayi/change_pytree_serialization
2025-12-04T09:33:41.3114096Z  * [new branch]              angelayi/cpp_loader         -> origin/angelayi/cpp_loader
2025-12-04T09:33:41.3115416Z  * [new branch]              angelayi/inductor_const     -> origin/angelayi/inductor_const
2025-12-04T09:33:41.3116575Z  * [new branch]              angelayi/lstm               -> origin/angelayi/lstm
2025-12-04T09:33:41.3118397Z  * [new branch]              angelayi/no_so_weight       -> origin/angelayi/no_so_weight
2025-12-04T09:33:41.3120136Z  * [new branch]              angelayi/scan_layers        -> origin/angelayi/scan_layers
2025-12-04T09:33:41.3121356Z  * [new branch]              angelayi/side_eff           -> origin/angelayi/side_eff
2025-12-04T09:33:41.3122984Z  * [new branch]              angelayi/state_dict         -> origin/angelayi/state_dict
2025-12-04T09:33:41.3124228Z  * [new branch]              angelayi/symint_input       -> origin/angelayi/symint_input
2025-12-04T09:33:41.3125796Z  * [new branch]              angelayi/symm_mem           -> origin/angelayi/symm_mem
2025-12-04T09:33:41.3126865Z  * [new branch]              angelayi/test_cpp           -> origin/angelayi/test_cpp
2025-12-04T09:33:41.3128170Z  * [new branch]              angelayi/torch_size         -> origin/angelayi/torch_size
2025-12-04T09:33:41.3129468Z  * [new branch]              annotate_assert             -> origin/annotate_assert
2025-12-04T09:33:41.3130982Z  * [new branch]              annotate_fallback_kernel    -> origin/annotate_fallback_kernel
2025-12-04T09:33:41.3132209Z  * [new branch]              annotation_deepcopy         -> origin/annotation_deepcopy
2025-12-04T09:33:41.3133520Z  * [new branch]              annotation_dynamo           -> origin/annotation_dynamo
2025-12-04T09:33:41.3134828Z  * [new branch]              aot_eager_stack_trace       -> origin/aot_eager_stack_trace
2025-12-04T09:33:41.3136282Z  * [new branch]              aoti-cuda-alloc             -> origin/aoti-cuda-alloc
2025-12-04T09:33:41.3137462Z  * [new branch]              aoti_const_device           -> origin/aoti_const_device
2025-12-04T09:33:41.3138769Z  * [new branch]              aoti_fqn_name_interface     -> origin/aoti_fqn_name_interface
2025-12-04T09:33:41.3140043Z  * [new branch]              aoti_package_weights_binary -> origin/aoti_package_weights_binary
2025-12-04T09:33:41.3141298Z  * [new branch]              aoti_target_windows         -> origin/aoti_target_windows
2025-12-04T09:33:41.3143801Z  * [new branch]              arsh/feat/inductor_check_profiling -> origin/arsh/feat/inductor_check_profiling
2025-12-04T09:33:41.3144831Z  * [new branch]              async_tp                    -> origin/async_tp
2025-12-04T09:33:41.3146440Z  * [new branch]              atalman-inductor-perf-cu124 -> origin/atalman-inductor-perf-cu124
2025-12-04T09:33:41.3147684Z  * [new branch]              atalman-inductor-perf-cu124.1 -> origin/atalman-inductor-perf-cu124.1
2025-12-04T09:33:41.3149019Z  * [new branch]              atalman-patch-2             -> origin/atalman-patch-2
2025-12-04T09:33:41.3150518Z  * [new branch]              atalman-patch-3             -> origin/atalman-patch-3
2025-12-04T09:33:41.3151756Z  * [new branch]              atalman-patch-4             -> origin/atalman-patch-4
2025-12-04T09:33:41.3153289Z  * [new branch]              atalman-patch-5             -> origin/atalman-patch-5
2025-12-04T09:33:41.3154542Z  * [new branch]              atalman-patch-6             -> origin/atalman-patch-6
2025-12-04T09:33:41.3156027Z  * [new branch]              atalman-patch-7             -> origin/atalman-patch-7
2025-12-04T09:33:41.3157431Z  * [new branch]              atalman-patch-8             -> origin/atalman-patch-8
2025-12-04T09:33:41.3158659Z  * [new branch]              atalman_inductor_2.3.1      -> origin/atalman_inductor_2.3.1
2025-12-04T09:33:41.3159991Z  * [new branch]              atalman_inductor_2.4.0      -> origin/atalman_inductor_2.4.0
2025-12-04T09:33:41.3161484Z  * [new branch]              atalman_inductor_2.4.x      -> origin/atalman_inductor_2.4.x
2025-12-04T09:33:41.3163104Z  * [new branch]              attention_benchmarking_clean -> origin/attention_benchmarking_clean
2025-12-04T09:33:41.3164868Z  * [new branch]              bahuang/dt_fix_scalar_add   -> origin/bahuang/dt_fix_scalar_add
2025-12-04T09:33:41.3165971Z  * [new branch]              bahuang/fix_debug_mode      -> origin/bahuang/fix_debug_mode
2025-12-04T09:33:41.3167277Z  * [new branch]              bahuang/fix_expand          -> origin/bahuang/fix_expand
2025-12-04T09:33:41.3168774Z  * [new branch]              bahuang/test                -> origin/bahuang/test
2025-12-04T09:33:41.3170623Z  * [new branch]              base/1.5                    -> origin/base/1.5
2025-12-04T09:33:41.3172224Z  * [new branch]              batching_sdpa_efficient_attention -> origin/batching_sdpa_efficient_attention
2025-12-04T09:33:41.3173413Z  * [new branch]              bench_scaled_mm_ops         -> origin/bench_scaled_mm_ops
2025-12-04T09:33:41.3174830Z  * [new branch]              benchmark-updates           -> origin/benchmark-updates
2025-12-04T09:33:41.3176082Z  * [new branch]              benchmarking-script         -> origin/benchmarking-script
2025-12-04T09:33:41.3177968Z  * [new branch]              bertmaher/pinbump26         -> origin/bertmaher/pinbump26
2025-12-04T09:33:41.3179704Z  * [new branch]              bertrand/cutlass            -> origin/bertrand/cutlass
2025-12-04T09:33:41.3181484Z  * [new branch]              bf/bug-static-input         -> origin/bf/bug-static-input
2025-12-04T09:33:41.3182551Z  * [new branch]              bf/cg-backend               -> origin/bf/cg-backend
2025-12-04T09:33:41.3183814Z  * [new branch]              bf/cg-nccl-test             -> origin/bf/cg-nccl-test
2025-12-04T09:33:41.3185030Z  * [new branch]              bf/cg-remove-check          -> origin/bf/cg-remove-check
2025-12-04T09:33:41.3186560Z  * [new branch]              bf/clean-torchbench-hf      -> origin/bf/clean-torchbench-hf
2025-12-04T09:33:41.3187674Z  * [new branch]              bf/combo-debug-log          -> origin/bf/combo-debug-log
2025-12-04T09:33:41.3188893Z  * [new branch]              bf/cudagraph                -> origin/bf/cudagraph
2025-12-04T09:33:41.3190864Z  * [new branch]              bf/cudagraph-disable-input-mutation -> origin/bf/cudagraph-disable-input-mutation
2025-12-04T09:33:41.3192417Z  * [new branch]              bf/cudagraph-enable-input-mutation-support-benchmark -> origin/bf/cudagraph-enable-input-mutation-support-benchmark
2025-12-04T09:33:41.3193489Z  * [new branch]              bf/cudagraph-partition      -> origin/bf/cudagraph-partition
2025-12-04T09:33:41.3194682Z  * [new branch]              bf/donated-buffer-bench     -> origin/bf/donated-buffer-bench
2025-12-04T09:33:41.3196036Z  * [new branch]              bf/dynamo-partition         -> origin/bf/dynamo-partition
2025-12-04T09:33:41.3197251Z  * [new branch]              bf/lite                     -> origin/bf/lite
2025-12-04T09:33:41.3198646Z  * [new branch]              bf/pa-non-divisible         -> origin/bf/pa-non-divisible
2025-12-04T09:33:41.3199990Z  * [new branch]              bf/partition-cache-free-symbols -> origin/bf/partition-cache-free-symbols
2025-12-04T09:33:41.3201601Z  * [new branch]              bf/partition-memory-plan    -> origin/bf/partition-memory-plan
2025-12-04T09:33:41.3203081Z  * [new branch]              bf/partition-move-cpu       -> origin/bf/partition-move-cpu
2025-12-04T09:33:41.3204505Z  * [new branch]              bf/partition-view-fallback  -> origin/bf/partition-view-fallback
2025-12-04T09:33:41.3205759Z  * [new branch]              bf/remove-check-55b0c39d    -> origin/bf/remove-check-55b0c39d
2025-12-04T09:33:41.3207004Z  * [new branch]              bf/timm-nov-26-2025         -> origin/bf/timm-nov-26-2025
2025-12-04T09:33:41.3208343Z  * [new branch]              bf/transformer-pin-4-57-3   -> origin/bf/transformer-pin-4-57-3
2025-12-04T09:33:41.3209728Z  * [new branch]              bisect_perf_hf_T5_3acc6eac492 -> origin/bisect_perf_hf_T5_3acc6eac492
2025-12-04T09:33:41.3210972Z  * [new branch]              bisect_perf_hf_T5_3fcf66f61fb -> origin/bisect_perf_hf_T5_3fcf66f61fb
2025-12-04T09:33:41.3212227Z  * [new branch]              bisect_perf_hf_T5_4009d154129 -> origin/bisect_perf_hf_T5_4009d154129
2025-12-04T09:33:41.3213470Z  * [new branch]              bisect_perf_hf_T5_40d0740e73d -> origin/bisect_perf_hf_T5_40d0740e73d
2025-12-04T09:33:41.3214677Z  * [new branch]              bisect_perf_hf_T5_5268754e  -> origin/bisect_perf_hf_T5_5268754e
2025-12-04T09:33:41.3215959Z  * [new branch]              bisect_perf_hf_T5_7d89a8d385c -> origin/bisect_perf_hf_T5_7d89a8d385c
2025-12-04T09:33:41.3217171Z  * [new branch]              bisect_perf_hf_T5_b7a25c1ee7c -> origin/bisect_perf_hf_T5_b7a25c1ee7c
2025-12-04T09:33:41.3218384Z  * [new branch]              bisect_perf_hf_T5_c25b201583f -> origin/bisect_perf_hf_T5_c25b201583f
2025-12-04T09:33:41.3219711Z  * [new branch]              bisect_perf_hf_T5_c93e57efac0 -> origin/bisect_perf_hf_T5_c93e57efac0
2025-12-04T09:33:41.3221204Z  * [new branch]              bisect_perf_hf_T5_ca9813ea149 -> origin/bisect_perf_hf_T5_ca9813ea149
2025-12-04T09:33:41.3222333Z  * [new branch]              bisect_perf_hf_T5_d65f194a  -> origin/bisect_perf_hf_T5_d65f194a
2025-12-04T09:33:41.3223571Z  * [new branch]              bisect_perf_hf_T5_da94ab0b  -> origin/bisect_perf_hf_T5_da94ab0b
2025-12-04T09:33:41.3224909Z  * [new branch]              bisect_perf_hf_T5_da94ab0b_new -> origin/bisect_perf_hf_T5_da94ab0b_new
2025-12-04T09:33:41.3226117Z  * [new branch]              bisect_perf_hf_T5_db4e8a1d8a8 -> origin/bisect_perf_hf_T5_db4e8a1d8a8
2025-12-04T09:33:41.3227360Z  * [new branch]              bisect_perf_hf_T5_e0d97e936a2 -> origin/bisect_perf_hf_T5_e0d97e936a2
2025-12-04T09:33:41.3228595Z  * [new branch]              bisect_perf_hf_T5_f23621ec563 -> origin/bisect_perf_hf_T5_f23621ec563
2025-12-04T09:33:41.3230543Z  * [new branch]              brister/fx_device_type      -> origin/brister/fx_device_type
2025-12-04T09:33:41.3231761Z  * [new branch]              brister/test_inductor_all_fx -> origin/brister/test_inductor_all_fx
2025-12-04T09:33:41.3233128Z  * [new branch]              brister/tiled_reduction_no_numel_check -> origin/brister/tiled_reduction_no_numel_check
2025-12-04T09:33:41.3234317Z  * [new branch]              bwd-backup                  -> origin/bwd-backup
2025-12-04T09:33:41.3235836Z  * [new branch]              c57382a49                   -> origin/c57382a49
2025-12-04T09:33:41.3236999Z  * [new branch]              ca_0431d47eaa               -> origin/ca_0431d47eaa
2025-12-04T09:33:41.3238260Z  * [new branch]              ca_fix_0431d47eaa           -> origin/ca_fix_0431d47eaa
2025-12-04T09:33:41.3240263Z  * [new branch]              camyllh/test_setup_hooks_push -> origin/camyllh/test_setup_hooks_push
2025-12-04T09:33:41.3241562Z  * [new branch]              cccclai-patch-1             -> origin/cccclai-patch-1
2025-12-04T09:33:41.3243294Z  * [new branch]              cherry-pick-159969-by-pytorch_bot_bot_ -> origin/cherry-pick-159969-by-pytorch_bot_bot_
2025-12-04T09:33:41.3244534Z  * [new branch]              cherry-pick-160586-by-pytorch_bot_bot_ -> origin/cherry-pick-160586-by-pytorch_bot_bot_
2025-12-04T09:33:41.3245968Z  * [new branch]              cherry-pick-162208-by-pytorch_bot_bot_ -> origin/cherry-pick-162208-by-pytorch_bot_bot_
2025-12-04T09:33:41.3247369Z  * [new branch]              cherry-pick-163169-by-pytorch_bot_bot_ -> origin/cherry-pick-163169-by-pytorch_bot_bot_
2025-12-04T09:33:41.3248724Z  * [new branch]              cherry-pick-165086-by-pytorch_bot_bot_ -> origin/cherry-pick-165086-by-pytorch_bot_bot_
2025-12-04T09:33:41.3250240Z  * [new branch]              cherry-pick-165514-by-pytorch_bot_bot_ -> origin/cherry-pick-165514-by-pytorch_bot_bot_
2025-12-04T09:33:41.3251528Z  * [new branch]              cherry-pick-165601-by-pytorch_bot_bot_ -> origin/cherry-pick-165601-by-pytorch_bot_bot_
2025-12-04T09:33:41.3252887Z  * [new branch]              cherry-pick-165667-by-pytorch_bot_bot_ -> origin/cherry-pick-165667-by-pytorch_bot_bot_
2025-12-04T09:33:41.3254338Z  * [new branch]              cherry-pick-165815-by-pytorch_bot_bot_ -> origin/cherry-pick-165815-by-pytorch_bot_bot_
2025-12-04T09:33:41.3255699Z  * [new branch]              cherry-pick-165922-by-pytorch_bot_bot_ -> origin/cherry-pick-165922-by-pytorch_bot_bot_
2025-12-04T09:33:41.3257057Z  * [new branch]              cherry-pick-166148-by-pytorch_bot_bot_ -> origin/cherry-pick-166148-by-pytorch_bot_bot_
2025-12-04T09:33:41.3258371Z  * [new branch]              cherry-pick-166181-by-pytorch_bot_bot_ -> origin/cherry-pick-166181-by-pytorch_bot_bot_
2025-12-04T09:33:41.3259688Z  * [new branch]              cherry-pick-166404-by-pytorch_bot_bot_ -> origin/cherry-pick-166404-by-pytorch_bot_bot_
2025-12-04T09:33:41.3261054Z  * [new branch]              cherry-pick-166427-by-pytorch_bot_bot_ -> origin/cherry-pick-166427-by-pytorch_bot_bot_
2025-12-04T09:33:41.3262489Z  * [new branch]              cherry-pick-166480-by-pytorch_bot_bot_ -> origin/cherry-pick-166480-by-pytorch_bot_bot_
2025-12-04T09:33:41.3263771Z  * [new branch]              cherry-pick-166570-by-pytorch_bot_bot_ -> origin/cherry-pick-166570-by-pytorch_bot_bot_
2025-12-04T09:33:41.3265114Z  * [new branch]              cherry-pick-166993-by-pytorch_bot_bot_ -> origin/cherry-pick-166993-by-pytorch_bot_bot_
2025-12-04T09:33:41.3266481Z  * [new branch]              cherry-pick-167111-by-pytorch_bot_bot_ -> origin/cherry-pick-167111-by-pytorch_bot_bot_
2025-12-04T09:33:41.3267840Z  * [new branch]              cherry-pick-167478-by-pytorch_bot_bot_ -> origin/cherry-pick-167478-by-pytorch_bot_bot_
2025-12-04T09:33:41.3269031Z  * [new branch]              cherry_pick_166036_166040   -> origin/cherry_pick_166036_166040
2025-12-04T09:33:41.3270278Z  * [new branch]              cherry_pick_166457          -> origin/cherry_pick_166457
2025-12-04T09:33:41.3271827Z  * [new branch]              cherrypick_166338           -> origin/cherrypick_166338
2025-12-04T09:33:41.3273068Z  * [new branch]              cherrypick_166458           -> origin/cherrypick_166458
2025-12-04T09:33:41.3274302Z  * [new branch]              cherrypick_166586           -> origin/cherrypick_166586
2025-12-04T09:33:41.3275594Z  * [new branch]              cherrypick_166956           -> origin/cherrypick_166956
2025-12-04T09:33:41.3276934Z  * [new branch]              ci_attn                     -> origin/ci_attn
2025-12-04T09:33:41.3278340Z  * [new branch]              codex-testing               -> origin/codex-testing
2025-12-04T09:33:41.3280498Z  * [new branch]              codex/add-check_memory_overlap-helper-functions -> origin/codex/add-check_memory_overlap-helper-functions
2025-12-04T09:33:41.3281534Z  * [new branch]              codex/fix-issue-121219-in-pytorch -> origin/codex/fix-issue-121219-in-pytorch
2025-12-04T09:33:41.3283600Z  * [new branch]              codex/investigate-segfaults-in-get_tensor_storage_id -> origin/codex/investigate-segfaults-in-get_tensor_storage_id
2025-12-04T09:33:41.3285011Z  * [new branch]              codex/refactor-lintrunner-config-to-use-uv-run -> origin/codex/refactor-lintrunner-config-to-use-uv-run
2025-12-04T09:33:41.3286051Z  * [new branch]              compatiblpy39util           -> origin/compatiblpy39util
2025-12-04T09:33:41.3287497Z  * [new branch]              cond_hop_device             -> origin/cond_hop_device
2025-12-04T09:33:41.3288932Z  * [new branch]              context_test                -> origin/context_test
2025-12-04T09:33:41.3290936Z  * [new branch]              copilot/code-style-cleanup-python-pip -> origin/copilot/code-style-cleanup-python-pip
2025-12-04T09:33:41.3292408Z  * [new branch]              cpio/fix_new_ami_tests      -> origin/cpio/fix_new_ami_tests
2025-12-04T09:33:41.3293959Z  * [new branch]              cpp-docs-dependency-upgrade -> origin/cpp-docs-dependency-upgrade
2025-12-04T09:33:41.3295824Z  * [new branch]              crpa/typo-in-inductor_comm_lowering -> origin/crpa/typo-in-inductor_comm_lowering
2025-12-04T09:33:41.3297383Z  * [new branch]              csl/always_produce_xml      -> origin/csl/always_produce_xml
2025-12-04T09:33:41.3298554Z  * [new branch]              csl/build_test_more_procs   -> origin/csl/build_test_more_procs
2025-12-04T09:33:41.3299823Z  * [new branch]              csl/build_test_more_procs2  -> origin/csl/build_test_more_procs2
2025-12-04T09:33:41.3301304Z  * [new branch]              csl/clean_up                -> origin/csl/clean_up
2025-12-04T09:33:41.3303138Z  * [new branch]              csl/fix_retry_segfault_exit -> origin/csl/fix_retry_segfault_exit
2025-12-04T09:33:41.3304227Z  * [new branch]              csl/katex                   -> origin/csl/katex
2025-12-04T09:33:41.3305883Z  * [new branch]              csl/larger_runner           -> origin/csl/larger_runner
2025-12-04T09:33:41.3307532Z  * [new branch]              csl/lint_testing            -> origin/csl/lint_testing
2025-12-04T09:33:41.3309166Z  * [new branch]              csl/lint_thing              -> origin/csl/lint_thing
2025-12-04T09:33:41.3310597Z  * [new branch]              csl/lintrunner_stuff        -> origin/csl/lintrunner_stuff
2025-12-04T09:33:41.3311876Z  * [new branch]              csl/manually_gen_json       -> origin/csl/manually_gen_json
2025-12-04T09:33:41.3313126Z  * [new branch]              csl/mps_sharding            -> origin/csl/mps_sharding
2025-12-04T09:33:41.3314551Z  * [new branch]              csl/multistage_docker       -> origin/csl/multistage_docker
2025-12-04T09:33:41.3315782Z  * [new branch]              csl/print_timing            -> origin/csl/print_timing
2025-12-04T09:33:41.3317058Z  * [new branch]              csl/remove_experiment       -> origin/csl/remove_experiment
2025-12-04T09:33:41.3318355Z  * [new branch]              csl/remove_maybe_unused_var -> origin/csl/remove_maybe_unused_var
2025-12-04T09:33:41.3319818Z  * [new branch]              csl/remove_repo_specific_autolabel -> origin/csl/remove_repo_specific_autolabel
2025-12-04T09:33:41.3321086Z  * [new branch]              csl/remove_run_parallel     -> origin/csl/remove_run_parallel
2025-12-04T09:33:41.3322322Z  * [new branch]              csl/remove_unused_vars      -> origin/csl/remove_unused_vars
2025-12-04T09:33:41.3323779Z  * [new branch]              csl/revert_open             -> origin/csl/revert_open
2025-12-04T09:33:41.3325013Z  * [new branch]              csl/skip_build              -> origin/csl/skip_build
2025-12-04T09:33:41.3326289Z  * [new branch]              csl/smaller_avx_amx_runenrs -> origin/csl/smaller_avx_amx_runenrs
2025-12-04T09:33:41.3327481Z  * [new branch]              csl/td_job_level            -> origin/csl/td_job_level
2025-12-04T09:33:41.3328842Z  * [new branch]              csl/test_cuda_build_large_runner -> origin/csl/test_cuda_build_large_runner
2025-12-04T09:33:41.3330270Z  * [new branch]              csl/test_owners_autograd_dispatch_nn -> origin/csl/test_owners_autograd_dispatch_nn
2025-12-04T09:33:41.3331478Z  * [new branch]              csl/test_owners_higher_confidence -> origin/csl/test_owners_higher_confidence
2025-12-04T09:33:41.3332762Z  * [new branch]              csl/upload_json_running     -> origin/csl/upload_json_running
2025-12-04T09:33:41.3334015Z  * [new branch]              csl/win_sccache             -> origin/csl/win_sccache
2025-12-04T09:33:41.3335246Z  * [new branch]              csl/xml_stuff               -> origin/csl/xml_stuff
2025-12-04T09:33:41.3336722Z  * [new branch]              cublasrelax2                -> origin/cublasrelax2
2025-12-04T09:33:41.3338519Z  * [new branch]              cuda_mempool                -> origin/cuda_mempool
2025-12-04T09:33:41.3339740Z  * [new branch]              custom_lowering_dict        -> origin/custom_lowering_dict
2025-12-04T09:33:41.3341615Z  * [new branch]              d4l3k/debug_plane_frtrace   -> origin/d4l3k/debug_plane_frtrace
2025-12-04T09:33:41.3343402Z  * [new branch]              daxia6/2.8o3                -> origin/daxia6/2.8o3
2025-12-04T09:33:41.3344599Z  * [new branch]              debug-guard                 -> origin/debug-guard
2025-12-04T09:33:41.3346094Z  * [new branch]              delete-quant-docs           -> origin/delete-quant-docs
2025-12-04T09:33:41.3350507Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.0
2025-12-04T09:33:41.3352122Z  * [new branch]              dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1 -> origin/dependabot/pip/dot-ci/docker/ci_commit_pins/main/transformers-4.57.1
2025-12-04T09:33:41.3353461Z  * [new branch]              desertfire/test_cpp_wrapper -> origin/desertfire/test_cpp_wrapper
2025-12-04T09:33:41.3354842Z  * [new branch]              desertfire/triton-cpu-for-aarch64 -> origin/desertfire/triton-cpu-for-aarch64
2025-12-04T09:33:41.3356916Z  * [new branch]              dev/dhruva/flex_attn_opt    -> origin/dev/dhruva/flex_attn_opt
2025-12-04T09:33:41.3359030Z  * [new branch]              dev/joona/MPSNDArrayAdd     -> origin/dev/joona/MPSNDArrayAdd
2025-12-04T09:33:41.3360612Z  * [new branch]              dev/joona/Unranked          -> origin/dev/joona/Unranked
2025-12-04T09:33:41.3362298Z  * [new branch]              dev/joona/cat               -> origin/dev/joona/cat
2025-12-04T09:33:41.3363684Z  * [new branch]              dev/joona/embeddingbag      -> origin/dev/joona/embeddingbag
2025-12-04T09:33:41.3364986Z  * [new branch]              dev/joona/fix_sdpa_memtest  -> origin/dev/joona/fix_sdpa_memtest
2025-12-04T09:33:41.3366687Z  * [new branch]              dev/joona/getTensorsString  -> origin/dev/joona/getTensorsString
2025-12-04T09:33:41.3368264Z  * [new branch]              dev/joona/mps_linear_macos14 -> origin/dev/joona/mps_linear_macos14
2025-12-04T09:33:41.3370114Z  * [new branch]              dev/joona/scalar_clamp      -> origin/dev/joona/scalar_clamp
2025-12-04T09:33:41.3371877Z  * [new branch]              dev/joona/sdpa              -> origin/dev/joona/sdpa
2025-12-04T09:33:41.3373882Z  * [new branch]              dev/joona/sdpa_api          -> origin/dev/joona/sdpa_api
2025-12-04T09:33:41.3375420Z  * [new branch]              dev/joona/type_inf          -> origin/dev/joona/type_inf
2025-12-04T09:33:41.3377035Z  * [new branch]              dev/joona/ulpAssertClose    -> origin/dev/joona/ulpAssertClose
2025-12-04T09:33:41.3378375Z  * [new branch]              dev/joona/upsize3d          -> origin/dev/joona/upsize3d
2025-12-04T09:33:41.3379619Z  * [new branch]              disp_counter                -> origin/disp_counter
2025-12-04T09:33:41.3381116Z  * [new branch]              divyanshk-patch-1           -> origin/divyanshk-patch-1
2025-12-04T09:33:41.3382244Z  * [new branch]              docs                        -> origin/docs
2025-12-04T09:33:41.3383741Z  * [new branch]              documentation               -> origin/documentation
2025-12-04T09:33:41.3384934Z  * [new branch]              eager_model_benchmarks      -> origin/eager_model_benchmarks
2025-12-04T09:33:41.3386891Z  * [new branch]              embg/test_inductor_ci_control -> origin/embg/test_inductor_ci_control
2025-12-04T09:33:41.3388030Z  * [new branch]              embg/triton_l2_prefetch_128B -> origin/embg/triton_l2_prefetch_128B
2025-12-04T09:33:41.3389209Z  * [new branch]              embg/triton_l2_prefetch_256B -> origin/embg/triton_l2_prefetch_256B
2025-12-04T09:33:41.3390537Z  * [new branch]              eqy-patch-1                 -> origin/eqy-patch-1
2025-12-04T09:33:41.3392038Z  * [new branch]              eqy-patch-2                 -> origin/eqy-patch-2
2025-12-04T09:33:41.3393433Z  * [new branch]              eqy-patch-3                 -> origin/eqy-patch-3
2025-12-04T09:33:41.3394656Z  * [new branch]              eqy-patch-4                 -> origin/eqy-patch-4
2025-12-04T09:33:41.3396101Z  * [new branch]              eqy-patch-5                 -> origin/eqy-patch-5
2025-12-04T09:33:41.3397253Z  * [new branch]              eqy-patch-6                 -> origin/eqy-patch-6
2025-12-04T09:33:41.3399219Z  * [new branch]              exclamaforte/amd-ma         -> origin/exclamaforte/amd-ma
2025-12-04T09:33:41.3400726Z  * [new branch]              exclamaforte/combo-kernels-perf-run -> origin/exclamaforte/combo-kernels-perf-run
2025-12-04T09:33:41.3402082Z  * [new branch]              exclamaforte/do_bench_refactor -> origin/exclamaforte/do_bench_refactor
2025-12-04T09:33:41.3403558Z  * [new branch]              exclamaforte/enable-mem-dep-fusion -> origin/exclamaforte/enable-mem-dep-fusion
2025-12-04T09:33:41.3404936Z  * [new branch]              exclamaforte/fix-exhaustive-autotuning -> origin/exclamaforte/fix-exhaustive-autotuning
2025-12-04T09:33:41.3406522Z  * [new branch]              exclamaforte/fix-trace-parsing-fx-svg -> origin/exclamaforte/fix-trace-parsing-fx-svg
2025-12-04T09:33:41.3408331Z  * [new branch]              exclamaforte/force-pointwise-cat-perf-run -> origin/exclamaforte/force-pointwise-cat-perf-run
2025-12-04T09:33:41.3409378Z  * [new branch]              exclamaforte/fusion-data    -> origin/exclamaforte/fusion-data
2025-12-04T09:33:41.3410997Z  * [new branch]              exclamaforte/gemm-benchmark-run -> origin/exclamaforte/gemm-benchmark-run
2025-12-04T09:33:41.3412153Z  * [new branch]              exclamaforte/gemm-export-model -> origin/exclamaforte/gemm-export-model
2025-12-04T09:33:41.3413415Z  * [new branch]              exclamaforte/gemm-model     -> origin/exclamaforte/gemm-model
2025-12-04T09:33:41.3415674Z  * [new branch]              exclamaforte/gemm-model-all-data-collection -> origin/exclamaforte/gemm-model-all-data-collection
2025-12-04T09:33:41.3416652Z  * [new branch]              exclamaforte/gemm-to-amd    -> origin/exclamaforte/gemm-to-amd
2025-12-04T09:33:41.3417742Z  * [new branch]              exclamaforte/just-gemm-model -> origin/exclamaforte/just-gemm-model
2025-12-04T09:33:41.3419271Z  * [new branch]              exclamaforte/just-gemm-model-no-refactor -> origin/exclamaforte/just-gemm-model-no-refactor
2025-12-04T09:33:41.3420583Z  * [new branch]              exclamaforte/profile-diff-algo -> origin/exclamaforte/profile-diff-algo
2025-12-04T09:33:41.3421936Z  * [new branch]              exclamaforte/profiler-visualization -> origin/exclamaforte/profiler-visualization
2025-12-04T09:33:41.3423228Z  * [new branch]              exclamaforte/test_cpp_wrapper_mode -> origin/exclamaforte/test_cpp_wrapper_mode
2025-12-04T09:33:41.3424655Z  * [new branch]              exclamaforte/update-autotune-configs -> origin/exclamaforte/update-autotune-configs
2025-12-04T09:33:41.3426015Z  * [new branch]              exclamaforte/update-autotune-configs-2 -> origin/exclamaforte/update-autotune-configs-2
2025-12-04T09:33:41.3427169Z  * [new branch]              exec                        -> origin/exec
2025-12-04T09:33:41.3428834Z  * [new branch]              experimental-mosaic         -> origin/experimental-mosaic
2025-12-04T09:33:41.3430118Z  * [new branch]              export-D61047529            -> origin/export-D61047529
2025-12-04T09:33:41.3431540Z  * [new branch]              export-D71412006            -> origin/export-D71412006
2025-12-04T09:33:41.3433001Z  * [new branch]              export-D73042989            -> origin/export-D73042989
2025-12-04T09:33:41.3434212Z  * [new branch]              export-D78957093            -> origin/export-D78957093
2025-12-04T09:33:41.3435505Z  * [new branch]              export-D78996107            -> origin/export-D78996107
2025-12-04T09:33:41.3436788Z  * [new branch]              export-D80823877            -> origin/export-D80823877
2025-12-04T09:33:41.3438300Z  * [new branch]              export-D80958642            -> origin/export-D80958642
2025-12-04T09:33:41.3439520Z  * [new branch]              export-D81054193            -> origin/export-D81054193
2025-12-04T09:33:41.3440776Z  * [new branch]              export-D81204584            -> origin/export-D81204584
2025-12-04T09:33:41.3442056Z  * [new branch]              export-D81429090            -> origin/export-D81429090
2025-12-04T09:33:41.3443866Z  * [new branch]              export-D82250826            -> origin/export-D82250826
2025-12-04T09:33:41.3445195Z  * [new branch]              export-D82253817            -> origin/export-D82253817
2025-12-04T09:33:41.3446460Z  * [new branch]              export-D83541846            -> origin/export-D83541846
2025-12-04T09:33:41.3447855Z  * [new branch]              export-D83627170            -> origin/export-D83627170
2025-12-04T09:33:41.3449075Z  * [new branch]              export-D83766701            -> origin/export-D83766701
2025-12-04T09:33:41.3450380Z  * [new branch]              export-D83768878            -> origin/export-D83768878
2025-12-04T09:33:41.3451805Z  * [new branch]              export-D83769447            -> origin/export-D83769447
2025-12-04T09:33:41.3453018Z  * [new branch]              export-D84089824            -> origin/export-D84089824
2025-12-04T09:33:41.3454281Z  * [new branch]              export-D84213020            -> origin/export-D84213020
2025-12-04T09:33:41.3456313Z  * [new branch]              export-D84373821            -> origin/export-D84373821
2025-12-04T09:33:41.3457658Z  * [new branch]              export-D84612194            -> origin/export-D84612194
2025-12-04T09:33:41.3458888Z  * [new branch]              export-D84890985            -> origin/export-D84890985
2025-12-04T09:33:41.3460161Z  * [new branch]              export-D85122326            -> origin/export-D85122326
2025-12-04T09:33:41.3461621Z  * [new branch]              export-D86256198            -> origin/export-D86256198
2025-12-04T09:33:41.3462830Z  * [new branch]              export-D86460608            -> origin/export-D86460608
2025-12-04T09:33:41.3464391Z  * [new branch]              export-D86474796            -> origin/export-D86474796
2025-12-04T09:33:41.3465873Z  * [new branch]              export-D86712396            -> origin/export-D86712396
2025-12-04T09:33:41.3467126Z  * [new branch]              export-D87022129            -> origin/export-D87022129
2025-12-04T09:33:41.3468583Z  * [new branch]              export-D87838959            -> origin/export-D87838959
2025-12-04T09:33:41.3469998Z  * [new branch]              export-D88319437            -> origin/export-D88319437
2025-12-04T09:33:41.3471538Z  * [new branch]              exported-model-train-idempotent -> origin/exported-model-train-idempotent
2025-12-04T09:33:41.3472745Z  * [new branch]              ezyang-titan-october        -> origin/ezyang-titan-october
2025-12-04T09:33:41.3474011Z  * [new branch]              ezyang-titan-october2       -> origin/ezyang-titan-october2
2025-12-04T09:33:41.3475245Z  * [new branch]              ezyang-war                  -> origin/ezyang-war
2025-12-04T09:33:41.3477252Z  * [new branch]              ezyang/wip-aot-descriptors  -> origin/ezyang/wip-aot-descriptors
2025-12-04T09:33:41.3478293Z  * [new branch]              fa_u8_brgemm                -> origin/fa_u8_brgemm
2025-12-04T09:33:41.3480235Z  * [new branch]              fadeputr/sequence_fbgemm    -> origin/fadeputr/sequence_fbgemm
2025-12-04T09:33:41.3481461Z  * [new branch]              fastmath_baseline           -> origin/fastmath_baseline
2025-12-04T09:33:41.3483587Z  * [new branch]              fbcode/warm                 -> origin/fbcode/warm
2025-12-04T09:33:41.3485011Z  * [new branch]              fca                         -> origin/fca
2025-12-04T09:33:41.3486229Z  * [new branch]              fca2_ca5984c                -> origin/fca2_ca5984c
2025-12-04T09:33:41.3487676Z  * [new branch]              fca5                        -> origin/fca5
2025-12-04T09:33:41.3490015Z  * [new branch]              feature/justknobs-cpp       -> origin/feature/justknobs-cpp
2025-12-04T09:33:41.3491284Z  * [new branch]              feature/numa-forkserver     -> origin/feature/numa-forkserver
2025-12-04T09:33:41.3493139Z  * [new branch]              ffast_math_baseline         -> origin/ffast_math_baseline
2025-12-04T09:33:41.3494318Z  * [new branch]              ffast_math_target           -> origin/ffast_math_target
2025-12-04T09:33:41.3496241Z  * [new branch]              findhao/base_commit         -> origin/findhao/base_commit
2025-12-04T09:33:41.3497473Z  * [new branch]              findhao/base_commit1        -> origin/findhao/base_commit1
2025-12-04T09:33:41.3498777Z  * [new branch]              findhao/multistream2        -> origin/findhao/multistream2
2025-12-04T09:33:41.3499980Z  * [new branch]              findhao/multistream5        -> origin/findhao/multistream5
2025-12-04T09:33:41.3501369Z  * [new branch]              findhao/multistream6        -> origin/findhao/multistream6
2025-12-04T09:33:41.3502868Z  * [new branch]              findhao/operatorbench3      -> origin/findhao/operatorbench3
2025-12-04T09:33:41.3504014Z  * [new branch]              findhao/operatorbench5      -> origin/findhao/operatorbench5
2025-12-04T09:33:41.3505158Z  * [new branch]              findhao/tritonparse         -> origin/findhao/tritonparse
2025-12-04T09:33:41.3506675Z  * [new branch]              fix-ck-gemm-template-format -> origin/fix-ck-gemm-template-format
2025-12-04T09:33:41.3508052Z  * [new branch]              fix-config-ignore           -> origin/fix-config-ignore
2025-12-04T09:33:41.3509170Z  * [new branch]              fix-dict-guard              -> origin/fix-dict-guard
2025-12-04T09:33:41.3510635Z  * [new branch]              fix_addmm_issue             -> origin/fix_addmm_issue
2025-12-04T09:33:41.3512427Z  * [new branch]              fix_amd_missing_cluster_dims -> origin/fix_amd_missing_cluster_dims
2025-12-04T09:33:41.3513658Z  * [new branch]              fix_bench_bwd_pass          -> origin/fix_bench_bwd_pass
2025-12-04T09:33:41.3514879Z  * [new branch]              fix_mem_profiler_config     -> origin/fix_mem_profiler_config
2025-12-04T09:33:41.3516108Z  * [new branch]              fix_nvrtc_discovery         -> origin/fix_nvrtc_discovery
2025-12-04T09:33:41.3517366Z  * [new branch]              fix_op_runner               -> origin/fix_op_runner
2025-12-04T09:33:41.3518801Z  * [new branch]              fix_ubn_159469              -> origin/fix_ubn_159469
2025-12-04T09:33:41.3520163Z  * [new branch]              fixes-triage                -> origin/fixes-triage
2025-12-04T09:33:41.3521374Z  * [new branch]              fixflashinfer               -> origin/fixflashinfer
2025-12-04T09:33:41.3522844Z  * [new branch]              flash_decoding_cpu          -> origin/flash_decoding_cpu
2025-12-04T09:33:41.3524481Z  * [new branch]              flex-flash                  -> origin/flex-flash
2025-12-04T09:33:41.3525911Z  * [new branch]              flex_attention_functorch_grad -> origin/flex_attention_functorch_grad
2025-12-04T09:33:41.3527088Z  * [new branch]              flex_flash                  -> origin/flex_flash
2025-12-04T09:33:41.3529105Z  * [new branch]              fmassa/fix_memeff_sharding_rule -> origin/fmassa/fix_memeff_sharding_rule
2025-12-04T09:33:41.3530360Z  * [new branch]              fmassa/tests_comm_compute_scheduler -> origin/fmassa/tests_comm_compute_scheduler
2025-12-04T09:33:41.3531620Z  * [new branch]              forkserver_fix              -> origin/forkserver_fix
2025-12-04T09:33:41.3533011Z  * [new branch]              fsdp2_trace_rules           -> origin/fsdp2_trace_rules
2025-12-04T09:33:41.3534367Z  * [new branch]              fx_cpp                      -> origin/fx_cpp
2025-12-04T09:33:41.3536183Z  * [new branch]              fy/fix-win                  -> origin/fy/fix-win
2025-12-04T09:33:41.3537646Z  * [new branch]              galv-patch-1                -> origin/galv-patch-1
2025-12-04T09:33:41.3539812Z  * [new branch]              galv/cudagraphs-conditional-nodes-4 -> origin/galv/cudagraphs-conditional-nodes-4
2025-12-04T09:33:41.3541528Z  * [new branch]              georgehong/cmakelists-patch -> origin/georgehong/cmakelists-patch
2025-12-04T09:33:41.3544319Z  * [new branch]              gh/AlnisM/1/base            -> origin/gh/AlnisM/1/base
2025-12-04T09:33:41.3545551Z  * [new branch]              gh/AlnisM/1/head            -> origin/gh/AlnisM/1/head
2025-12-04T09:33:41.3547899Z  * [new branch]              gh/EikanWang/67/base        -> origin/gh/EikanWang/67/base
2025-12-04T09:33:41.3549097Z  * [new branch]              gh/EikanWang/67/head        -> origin/gh/EikanWang/67/head
2025-12-04T09:33:41.3551747Z  * [new branch]              gh/Gasoonjia/1/base         -> origin/gh/Gasoonjia/1/base
2025-12-04T09:33:41.3553004Z  * [new branch]              gh/Gasoonjia/1/head         -> origin/gh/Gasoonjia/1/head
2025-12-04T09:33:41.3555341Z  * [new branch]              gh/H-Huang/131/base         -> origin/gh/H-Huang/131/base
2025-12-04T09:33:41.3556589Z  * [new branch]              gh/H-Huang/131/head         -> origin/gh/H-Huang/131/head
2025-12-04T09:33:41.3557946Z  * [new branch]              gh/H-Huang/131/orig         -> origin/gh/H-Huang/131/orig
2025-12-04T09:33:41.3559727Z  * [new branch]              gh/H-Huang/132/base         -> origin/gh/H-Huang/132/base
2025-12-04T09:33:41.3560941Z  * [new branch]              gh/H-Huang/132/head         -> origin/gh/H-Huang/132/head
2025-12-04T09:33:41.3562394Z  * [new branch]              gh/H-Huang/132/orig         -> origin/gh/H-Huang/132/orig
2025-12-04T09:33:41.3585649Z  * [new branch]              gh/H-Huang/180/base         -> origin/gh/H-Huang/180/base
2025-12-04T09:33:41.3586331Z  * [new branch]              gh/H-Huang/180/head         -> origin/gh/H-Huang/180/head
2025-12-04T09:33:41.3586967Z  * [new branch]              gh/H-Huang/180/orig         -> origin/gh/H-Huang/180/orig
2025-12-04T09:33:41.3587591Z  * [new branch]              gh/H-Huang/182/base         -> origin/gh/H-Huang/182/base
2025-12-04T09:33:41.3588324Z  * [new branch]              gh/H-Huang/182/head         -> origin/gh/H-Huang/182/head
2025-12-04T09:33:41.3588951Z  * [new branch]              gh/H-Huang/182/orig         -> origin/gh/H-Huang/182/orig
2025-12-04T09:33:41.3589570Z  * [new branch]              gh/H-Huang/226/base         -> origin/gh/H-Huang/226/base
2025-12-04T09:33:41.3590175Z  * [new branch]              gh/H-Huang/226/head         -> origin/gh/H-Huang/226/head
2025-12-04T09:33:41.3590791Z  * [new branch]              gh/H-Huang/226/orig         -> origin/gh/H-Huang/226/orig
2025-12-04T09:33:41.3591418Z  * [new branch]              gh/H-Huang/228/base         -> origin/gh/H-Huang/228/base
2025-12-04T09:33:41.3592037Z  * [new branch]              gh/H-Huang/228/head         -> origin/gh/H-Huang/228/head
2025-12-04T09:33:41.3592642Z  * [new branch]              gh/H-Huang/228/orig         -> origin/gh/H-Huang/228/orig
2025-12-04T09:33:41.3593386Z  * [new branch]              gh/IvanKobzarev/150/base    -> origin/gh/IvanKobzarev/150/base
2025-12-04T09:33:41.3594103Z  * [new branch]              gh/IvanKobzarev/150/head    -> origin/gh/IvanKobzarev/150/head
2025-12-04T09:33:41.3594801Z  * [new branch]              gh/IvanKobzarev/150/orig    -> origin/gh/IvanKobzarev/150/orig
2025-12-04T09:33:41.3595515Z  * [new branch]              gh/IvanKobzarev/157/base    -> origin/gh/IvanKobzarev/157/base
2025-12-04T09:33:41.3596222Z  * [new branch]              gh/IvanKobzarev/157/head    -> origin/gh/IvanKobzarev/157/head
2025-12-04T09:33:41.3596944Z  * [new branch]              gh/IvanKobzarev/157/orig    -> origin/gh/IvanKobzarev/157/orig
2025-12-04T09:33:41.3597639Z  * [new branch]              gh/IvanKobzarev/159/base    -> origin/gh/IvanKobzarev/159/base
2025-12-04T09:33:41.3598351Z  * [new branch]              gh/IvanKobzarev/159/head    -> origin/gh/IvanKobzarev/159/head
2025-12-04T09:33:41.3599064Z  * [new branch]              gh/IvanKobzarev/159/orig    -> origin/gh/IvanKobzarev/159/orig
2025-12-04T09:33:41.3599771Z  * [new branch]              gh/IvanKobzarev/162/base    -> origin/gh/IvanKobzarev/162/base
2025-12-04T09:33:41.3600466Z  * [new branch]              gh/IvanKobzarev/162/head    -> origin/gh/IvanKobzarev/162/head
2025-12-04T09:33:41.3601395Z  * [new branch]              gh/IvanKobzarev/162/orig    -> origin/gh/IvanKobzarev/162/orig
2025-12-04T09:33:41.3602104Z  * [new branch]              gh/IvanKobzarev/163/base    -> origin/gh/IvanKobzarev/163/base
2025-12-04T09:33:41.3602887Z  * [new branch]              gh/IvanKobzarev/163/head    -> origin/gh/IvanKobzarev/163/head
2025-12-04T09:33:41.3603585Z  * [new branch]              gh/IvanKobzarev/163/orig    -> origin/gh/IvanKobzarev/163/orig
2025-12-04T09:33:41.3605346Z  * [new branch]              gh/IvanKobzarev/166/base    -> origin/gh/IvanKobzarev/166/base
2025-12-04T09:33:41.3606506Z  * [new branch]              gh/IvanKobzarev/166/head    -> origin/gh/IvanKobzarev/166/head
2025-12-04T09:33:41.3607813Z  * [new branch]              gh/IvanKobzarev/166/orig    -> origin/gh/IvanKobzarev/166/orig
2025-12-04T09:33:41.3609717Z  * [new branch]              gh/IvanKobzarev/167/base    -> origin/gh/IvanKobzarev/167/base
2025-12-04T09:33:41.3610868Z  * [new branch]              gh/IvanKobzarev/167/head    -> origin/gh/IvanKobzarev/167/head
2025-12-04T09:33:41.3612165Z  * [new branch]              gh/IvanKobzarev/167/orig    -> origin/gh/IvanKobzarev/167/orig
2025-12-04T09:33:41.3614001Z  * [new branch]              gh/IvanKobzarev/168/base    -> origin/gh/IvanKobzarev/168/base
2025-12-04T09:33:41.3615317Z  * [new branch]              gh/IvanKobzarev/168/head    -> origin/gh/IvanKobzarev/168/head
2025-12-04T09:33:41.3616442Z  * [new branch]              gh/IvanKobzarev/168/orig    -> origin/gh/IvanKobzarev/168/orig
2025-12-04T09:33:41.3618336Z  * [new branch]              gh/IvanKobzarev/169/base    -> origin/gh/IvanKobzarev/169/base
2025-12-04T09:33:41.3619571Z  * [new branch]              gh/IvanKobzarev/169/head    -> origin/gh/IvanKobzarev/169/head
2025-12-04T09:33:41.3620834Z  * [new branch]              gh/IvanKobzarev/169/orig    -> origin/gh/IvanKobzarev/169/orig
2025-12-04T09:33:41.3622580Z  * [new branch]              gh/IvanKobzarev/170/base    -> origin/gh/IvanKobzarev/170/base
2025-12-04T09:33:41.3623776Z  * [new branch]              gh/IvanKobzarev/170/head    -> origin/gh/IvanKobzarev/170/head
2025-12-04T09:33:41.3625055Z  * [new branch]              gh/IvanKobzarev/170/orig    -> origin/gh/IvanKobzarev/170/orig
2025-12-04T09:33:41.3627154Z  * [new branch]              gh/IvanKobzarev/171/base    -> origin/gh/IvanKobzarev/171/base
2025-12-04T09:33:41.3628404Z  * [new branch]              gh/IvanKobzarev/171/head    -> origin/gh/IvanKobzarev/171/head
2025-12-04T09:33:41.3629712Z  * [new branch]              gh/IvanKobzarev/171/orig    -> origin/gh/IvanKobzarev/171/orig
2025-12-04T09:33:41.3631542Z  * [new branch]              gh/IvanKobzarev/172/base    -> origin/gh/IvanKobzarev/172/base
2025-12-04T09:33:41.3632878Z  * [new branch]              gh/IvanKobzarev/172/head    -> origin/gh/IvanKobzarev/172/head
2025-12-04T09:33:41.3634138Z  * [new branch]              gh/IvanKobzarev/172/orig    -> origin/gh/IvanKobzarev/172/orig
2025-12-04T09:33:41.3635977Z  * [new branch]              gh/IvanKobzarev/173/base    -> origin/gh/IvanKobzarev/173/base
2025-12-04T09:33:41.3637164Z  * [new branch]              gh/IvanKobzarev/173/head    -> origin/gh/IvanKobzarev/173/head
2025-12-04T09:33:41.3638479Z  * [new branch]              gh/IvanKobzarev/173/orig    -> origin/gh/IvanKobzarev/173/orig
2025-12-04T09:33:41.3640385Z  * [new branch]              gh/IvanKobzarev/174/base    -> origin/gh/IvanKobzarev/174/base
2025-12-04T09:33:41.3641651Z  * [new branch]              gh/IvanKobzarev/174/head    -> origin/gh/IvanKobzarev/174/head
2025-12-04T09:33:41.3643083Z  * [new branch]              gh/IvanKobzarev/174/orig    -> origin/gh/IvanKobzarev/174/orig
2025-12-04T09:33:41.3644957Z  * [new branch]              gh/IvanKobzarev/175/base    -> origin/gh/IvanKobzarev/175/base
2025-12-04T09:33:41.3646308Z  * [new branch]              gh/IvanKobzarev/175/head    -> origin/gh/IvanKobzarev/175/head
2025-12-04T09:33:41.3647671Z  * [new branch]              gh/IvanKobzarev/175/orig    -> origin/gh/IvanKobzarev/175/orig
2025-12-04T09:33:41.3649642Z  * [new branch]              gh/IvanKobzarev/176/base    -> origin/gh/IvanKobzarev/176/base
2025-12-04T09:33:41.3650924Z  * [new branch]              gh/IvanKobzarev/176/head    -> origin/gh/IvanKobzarev/176/head
2025-12-04T09:33:41.3652171Z  * [new branch]              gh/IvanKobzarev/176/orig    -> origin/gh/IvanKobzarev/176/orig
2025-12-04T09:33:41.3654364Z  * [new branch]              gh/IvanKobzarev/177/base    -> origin/gh/IvanKobzarev/177/base
2025-12-04T09:33:41.3655927Z  * [new branch]              gh/IvanKobzarev/177/head    -> origin/gh/IvanKobzarev/177/head
2025-12-04T09:33:41.3657194Z  * [new branch]              gh/IvanKobzarev/177/orig    -> origin/gh/IvanKobzarev/177/orig
2025-12-04T09:33:41.3659186Z  * [new branch]              gh/IvanKobzarev/178/base    -> origin/gh/IvanKobzarev/178/base
2025-12-04T09:33:41.3660554Z  * [new branch]              gh/IvanKobzarev/178/head    -> origin/gh/IvanKobzarev/178/head
2025-12-04T09:33:41.3661868Z  * [new branch]              gh/IvanKobzarev/178/orig    -> origin/gh/IvanKobzarev/178/orig
2025-12-04T09:33:41.3663792Z  * [new branch]              gh/IvanKobzarev/179/base    -> origin/gh/IvanKobzarev/179/base
2025-12-04T09:33:41.3664922Z  * [new branch]              gh/IvanKobzarev/179/head    -> origin/gh/IvanKobzarev/179/head
2025-12-04T09:33:41.3666302Z  * [new branch]              gh/IvanKobzarev/179/orig    -> origin/gh/IvanKobzarev/179/orig
2025-12-04T09:33:41.3668104Z  * [new branch]              gh/IvanKobzarev/180/base    -> origin/gh/IvanKobzarev/180/base
2025-12-04T09:33:41.3669317Z  * [new branch]              gh/IvanKobzarev/180/head    -> origin/gh/IvanKobzarev/180/head
2025-12-04T09:33:41.3670584Z  * [new branch]              gh/IvanKobzarev/180/orig    -> origin/gh/IvanKobzarev/180/orig
2025-12-04T09:33:41.3672650Z  * [new branch]              gh/IvanKobzarev/181/base    -> origin/gh/IvanKobzarev/181/base
2025-12-04T09:33:41.3673912Z  * [new branch]              gh/IvanKobzarev/181/head    -> origin/gh/IvanKobzarev/181/head
2025-12-04T09:33:41.3675237Z  * [new branch]              gh/IvanKobzarev/181/orig    -> origin/gh/IvanKobzarev/181/orig
2025-12-04T09:33:41.3677341Z  * [new branch]              gh/IvanKobzarev/182/base    -> origin/gh/IvanKobzarev/182/base
2025-12-04T09:33:41.3678555Z  * [new branch]              gh/IvanKobzarev/182/head    -> origin/gh/IvanKobzarev/182/head
2025-12-04T09:33:41.3679830Z  * [new branch]              gh/IvanKobzarev/182/orig    -> origin/gh/IvanKobzarev/182/orig
2025-12-04T09:33:41.3681920Z  * [new branch]              gh/IvanKobzarev/183/base    -> origin/gh/IvanKobzarev/183/base
2025-12-04T09:33:41.3683368Z  * [new branch]              gh/IvanKobzarev/183/head    -> origin/gh/IvanKobzarev/183/head
2025-12-04T09:33:41.3684727Z  * [new branch]              gh/IvanKobzarev/183/orig    -> origin/gh/IvanKobzarev/183/orig
2025-12-04T09:33:41.3686605Z  * [new branch]              gh/IvanKobzarev/184/base    -> origin/gh/IvanKobzarev/184/base
2025-12-04T09:33:41.3687863Z  * [new branch]              gh/IvanKobzarev/184/head    -> origin/gh/IvanKobzarev/184/head
2025-12-04T09:33:41.3689162Z  * [new branch]              gh/IvanKobzarev/184/orig    -> origin/gh/IvanKobzarev/184/orig
2025-12-04T09:33:41.3691386Z  * [new branch]              gh/NikhilAPatel/1/base      -> origin/gh/NikhilAPatel/1/base
2025-12-04T09:33:41.3692764Z  * [new branch]              gh/NikhilAPatel/1/head      -> origin/gh/NikhilAPatel/1/head
2025-12-04T09:33:41.3694395Z  * [new branch]              gh/NikhilAPatel/2/base      -> origin/gh/NikhilAPatel/2/base
2025-12-04T09:33:41.3695566Z  * [new branch]              gh/NikhilAPatel/2/head      -> origin/gh/NikhilAPatel/2/head
2025-12-04T09:33:41.3697736Z  * [new branch]              gh/NikhilAPatel/4/base      -> origin/gh/NikhilAPatel/4/base
2025-12-04T09:33:41.3699210Z  * [new branch]              gh/NikhilAPatel/4/head      -> origin/gh/NikhilAPatel/4/head
2025-12-04T09:33:41.3701123Z  * [new branch]              gh/NikhilAPatel/5/base      -> origin/gh/NikhilAPatel/5/base
2025-12-04T09:33:41.3702496Z  * [new branch]              gh/NikhilAPatel/5/head      -> origin/gh/NikhilAPatel/5/head
2025-12-04T09:33:41.3703816Z  * [new branch]              gh/NikhilAPatel/5/orig      -> origin/gh/NikhilAPatel/5/orig
2025-12-04T09:33:41.3705948Z  * [new branch]              gh/PaliC/17/base            -> origin/gh/PaliC/17/base
2025-12-04T09:33:41.3707189Z  * [new branch]              gh/PaliC/17/head            -> origin/gh/PaliC/17/head
2025-12-04T09:33:41.3708573Z  * [new branch]              gh/PaliC/17/orig            -> origin/gh/PaliC/17/orig
2025-12-04T09:33:41.3710315Z  * [new branch]              gh/PaliC/18/base            -> origin/gh/PaliC/18/base
2025-12-04T09:33:41.3711522Z  * [new branch]              gh/PaliC/18/head            -> origin/gh/PaliC/18/head
2025-12-04T09:33:41.3712957Z  * [new branch]              gh/PaliC/18/orig            -> origin/gh/PaliC/18/orig
2025-12-04T09:33:41.3714642Z  * [new branch]              gh/PaliC/20/base            -> origin/gh/PaliC/20/base
2025-12-04T09:33:41.3715855Z  * [new branch]              gh/PaliC/20/head            -> origin/gh/PaliC/20/head
2025-12-04T09:33:41.3717140Z  * [new branch]              gh/PaliC/20/orig            -> origin/gh/PaliC/20/orig
2025-12-04T09:33:41.3718904Z  * [new branch]              gh/PaliC/21/base            -> origin/gh/PaliC/21/base
2025-12-04T09:33:41.3720266Z  * [new branch]              gh/PaliC/21/head            -> origin/gh/PaliC/21/head
2025-12-04T09:33:41.3721391Z  * [new branch]              gh/PaliC/21/orig            -> origin/gh/PaliC/21/orig
2025-12-04T09:33:41.3723288Z  * [new branch]              gh/PaliC/23/base            -> origin/gh/PaliC/23/base
2025-12-04T09:33:41.3724450Z  * [new branch]              gh/PaliC/23/head            -> origin/gh/PaliC/23/head
2025-12-04T09:33:41.3725805Z  * [new branch]              gh/PaliC/23/orig            -> origin/gh/PaliC/23/orig
2025-12-04T09:33:41.3727538Z  * [new branch]              gh/PaliC/24/base            -> origin/gh/PaliC/24/base
2025-12-04T09:33:41.3728686Z  * [new branch]              gh/PaliC/24/head            -> origin/gh/PaliC/24/head
2025-12-04T09:33:41.3729919Z  * [new branch]              gh/PaliC/24/orig            -> origin/gh/PaliC/24/orig
2025-12-04T09:33:41.3731685Z  * [new branch]              gh/PaliC/25/head            -> origin/gh/PaliC/25/head
2025-12-04T09:33:41.3732892Z  * [new branch]              gh/PaliC/25/next            -> origin/gh/PaliC/25/next
2025-12-04T09:33:41.3734272Z  * [new branch]              gh/PaliC/25/orig            -> origin/gh/PaliC/25/orig
2025-12-04T09:33:41.3735943Z  * [new branch]              gh/PaliC/26/head            -> origin/gh/PaliC/26/head
2025-12-04T09:33:41.3737390Z  * [new branch]              gh/PaliC/26/next            -> origin/gh/PaliC/26/next
2025-12-04T09:33:41.3738674Z  * [new branch]              gh/PaliC/26/orig            -> origin/gh/PaliC/26/orig
2025-12-04T09:33:41.3740481Z  * [new branch]              gh/PaliC/27/next            -> origin/gh/PaliC/27/next
2025-12-04T09:33:41.3742132Z  * [new branch]              gh/PaliC/28/head            -> origin/gh/PaliC/28/head
2025-12-04T09:33:41.3743175Z  * [new branch]              gh/PaliC/28/next            -> origin/gh/PaliC/28/next
2025-12-04T09:33:41.3744594Z  * [new branch]              gh/PaliC/28/orig            -> origin/gh/PaliC/28/orig
2025-12-04T09:33:41.3746294Z  * [new branch]              gh/PaliC/29/head            -> origin/gh/PaliC/29/head
2025-12-04T09:33:41.3747316Z  * [new branch]              gh/PaliC/29/next            -> origin/gh/PaliC/29/next
2025-12-04T09:33:41.3748583Z  * [new branch]              gh/PaliC/29/orig            -> origin/gh/PaliC/29/orig
2025-12-04T09:33:41.3750447Z  * [new branch]              gh/PaliC/30/head            -> origin/gh/PaliC/30/head
2025-12-04T09:33:41.3751481Z  * [new branch]              gh/PaliC/30/next            -> origin/gh/PaliC/30/next
2025-12-04T09:33:41.3752862Z  * [new branch]              gh/PaliC/30/orig            -> origin/gh/PaliC/30/orig
2025-12-04T09:33:41.3754516Z  * [new branch]              gh/PaliC/31/head            -> origin/gh/PaliC/31/head
2025-12-04T09:33:41.3755605Z  * [new branch]              gh/PaliC/31/next            -> origin/gh/PaliC/31/next
2025-12-04T09:33:41.3757402Z  * [new branch]              gh/PaliC/31/orig            -> origin/gh/PaliC/31/orig
2025-12-04T09:33:41.3759496Z  * [new branch]              gh/PaulZhang12/25/base      -> origin/gh/PaulZhang12/25/base
2025-12-04T09:33:41.3760800Z  * [new branch]              gh/PaulZhang12/25/head      -> origin/gh/PaulZhang12/25/head
2025-12-04T09:33:41.3762123Z  * [new branch]              gh/PaulZhang12/25/orig      -> origin/gh/PaulZhang12/25/orig
2025-12-04T09:33:41.3764130Z  * [new branch]              gh/PaulZhang12/28/base      -> origin/gh/PaulZhang12/28/base
2025-12-04T09:33:41.3765412Z  * [new branch]              gh/PaulZhang12/28/head      -> origin/gh/PaulZhang12/28/head
2025-12-04T09:33:41.3766696Z  * [new branch]              gh/PaulZhang12/28/orig      -> origin/gh/PaulZhang12/28/orig
2025-12-04T09:33:41.3768871Z  * [new branch]              gh/PaulZhang12/31/base      -> origin/gh/PaulZhang12/31/base
2025-12-04T09:33:41.3772108Z  * [new branch]              gh/PaulZhang12/31/head      -> origin/gh/PaulZhang12/31/head
2025-12-04T09:33:41.3773494Z  * [new branch]              gh/PaulZhang12/31/orig      -> origin/gh/PaulZhang12/31/orig
2025-12-04T09:33:41.3774170Z  * [new branch]              gh/PaulZhang12/37/base      -> origin/gh/PaulZhang12/37/base
2025-12-04T09:33:41.3774865Z  * [new branch]              gh/PaulZhang12/37/head      -> origin/gh/PaulZhang12/37/head
2025-12-04T09:33:41.3775569Z  * [new branch]              gh/PaulZhang12/37/orig      -> origin/gh/PaulZhang12/37/orig
2025-12-04T09:33:41.3777460Z  * [new branch]              gh/PaulZhang12/40/base      -> origin/gh/PaulZhang12/40/base
2025-12-04T09:33:41.3778634Z  * [new branch]              gh/PaulZhang12/40/head      -> origin/gh/PaulZhang12/40/head
2025-12-04T09:33:41.3779880Z  * [new branch]              gh/PaulZhang12/40/orig      -> origin/gh/PaulZhang12/40/orig
2025-12-04T09:33:41.3781718Z  * [new branch]              gh/PaulZhang12/42/base      -> origin/gh/PaulZhang12/42/base
2025-12-04T09:33:41.3782952Z  * [new branch]              gh/PaulZhang12/42/head      -> origin/gh/PaulZhang12/42/head
2025-12-04T09:33:41.3784785Z  * [new branch]              gh/PaulZhang12/43/base      -> origin/gh/PaulZhang12/43/base
2025-12-04T09:33:41.3786021Z  * [new branch]              gh/PaulZhang12/43/head      -> origin/gh/PaulZhang12/43/head
2025-12-04T09:33:41.3787290Z  * [new branch]              gh/PaulZhang12/43/orig      -> origin/gh/PaulZhang12/43/orig
2025-12-04T09:33:41.3788981Z  * [new branch]              gh/PaulZhang12/44/base      -> origin/gh/PaulZhang12/44/base
2025-12-04T09:33:41.3790158Z  * [new branch]              gh/PaulZhang12/44/head      -> origin/gh/PaulZhang12/44/head
2025-12-04T09:33:41.3792045Z  * [new branch]              gh/PaulZhang12/45/base      -> origin/gh/PaulZhang12/45/base
2025-12-04T09:33:41.3793203Z  * [new branch]              gh/PaulZhang12/45/head      -> origin/gh/PaulZhang12/45/head
2025-12-04T09:33:41.3794396Z  * [new branch]              gh/PaulZhang12/45/orig      -> origin/gh/PaulZhang12/45/orig
2025-12-04T09:33:41.3796207Z  * [new branch]              gh/PaulZhang12/46/base      -> origin/gh/PaulZhang12/46/base
2025-12-04T09:33:41.3797620Z  * [new branch]              gh/PaulZhang12/46/head      -> origin/gh/PaulZhang12/46/head
2025-12-04T09:33:41.3799165Z  * [new branch]              gh/PaulZhang12/46/orig      -> origin/gh/PaulZhang12/46/orig
2025-12-04T09:33:41.3801160Z  * [new branch]              gh/PaulZhang12/47/base      -> origin/gh/PaulZhang12/47/base
2025-12-04T09:33:41.3802802Z  * [new branch]              gh/PaulZhang12/47/head      -> origin/gh/PaulZhang12/47/head
2025-12-04T09:33:41.3803997Z  * [new branch]              gh/PaulZhang12/47/orig      -> origin/gh/PaulZhang12/47/orig
2025-12-04T09:33:41.3805645Z  * [new branch]              gh/PaulZhang12/48/base      -> origin/gh/PaulZhang12/48/base
2025-12-04T09:33:41.3806867Z  * [new branch]              gh/PaulZhang12/48/head      -> origin/gh/PaulZhang12/48/head
2025-12-04T09:33:41.3808123Z  * [new branch]              gh/PaulZhang12/48/orig      -> origin/gh/PaulZhang12/48/orig
2025-12-04T09:33:41.3810227Z  * [new branch]              gh/SamGinzburg/11/base      -> origin/gh/SamGinzburg/11/base
2025-12-04T09:33:41.3811447Z  * [new branch]              gh/SamGinzburg/11/head      -> origin/gh/SamGinzburg/11/head
2025-12-04T09:33:41.3813824Z  * [new branch]              gh/SherlockNoMad/1/base     -> origin/gh/SherlockNoMad/1/base
2025-12-04T09:33:41.3815129Z  * [new branch]              gh/SherlockNoMad/1/head     -> origin/gh/SherlockNoMad/1/head
2025-12-04T09:33:41.3817011Z  * [new branch]              gh/SherlockNoMad/10/base    -> origin/gh/SherlockNoMad/10/base
2025-12-04T09:33:41.3818246Z  * [new branch]              gh/SherlockNoMad/10/head    -> origin/gh/SherlockNoMad/10/head
2025-12-04T09:33:41.3819630Z  * [new branch]              gh/SherlockNoMad/10/orig    -> origin/gh/SherlockNoMad/10/orig
2025-12-04T09:33:41.3821256Z  * [new branch]              gh/SherlockNoMad/11/base    -> origin/gh/SherlockNoMad/11/base
2025-12-04T09:33:41.3822493Z  * [new branch]              gh/SherlockNoMad/11/head    -> origin/gh/SherlockNoMad/11/head
2025-12-04T09:33:41.3823909Z  * [new branch]              gh/SherlockNoMad/11/orig    -> origin/gh/SherlockNoMad/11/orig
2025-12-04T09:33:41.3825282Z  * [new branch]              gh/SherlockNoMad/12/base    -> origin/gh/SherlockNoMad/12/base
2025-12-04T09:33:41.3826562Z  * [new branch]              gh/SherlockNoMad/12/head    -> origin/gh/SherlockNoMad/12/head
2025-12-04T09:33:41.3827824Z  * [new branch]              gh/SherlockNoMad/12/orig    -> origin/gh/SherlockNoMad/12/orig
2025-12-04T09:33:41.3829699Z  * [new branch]              gh/SherlockNoMad/15/base    -> origin/gh/SherlockNoMad/15/base
2025-12-04T09:33:41.3830965Z  * [new branch]              gh/SherlockNoMad/15/head    -> origin/gh/SherlockNoMad/15/head
2025-12-04T09:33:41.3832320Z  * [new branch]              gh/SherlockNoMad/15/orig    -> origin/gh/SherlockNoMad/15/orig
2025-12-04T09:33:41.3834014Z  * [new branch]              gh/SherlockNoMad/17/base    -> origin/gh/SherlockNoMad/17/base
2025-12-04T09:33:41.3835228Z  * [new branch]              gh/SherlockNoMad/17/head    -> origin/gh/SherlockNoMad/17/head
2025-12-04T09:33:41.3836469Z  * [new branch]              gh/SherlockNoMad/17/orig    -> origin/gh/SherlockNoMad/17/orig
2025-12-04T09:33:41.3838431Z  * [new branch]              gh/SherlockNoMad/18/base    -> origin/gh/SherlockNoMad/18/base
2025-12-04T09:33:41.3839669Z  * [new branch]              gh/SherlockNoMad/18/head    -> origin/gh/SherlockNoMad/18/head
2025-12-04T09:33:41.3840984Z  * [new branch]              gh/SherlockNoMad/18/orig    -> origin/gh/SherlockNoMad/18/orig
2025-12-04T09:33:41.3842682Z  * [new branch]              gh/SherlockNoMad/19/base    -> origin/gh/SherlockNoMad/19/base
2025-12-04T09:33:41.3844018Z  * [new branch]              gh/SherlockNoMad/19/head    -> origin/gh/SherlockNoMad/19/head
2025-12-04T09:33:41.3845371Z  * [new branch]              gh/SherlockNoMad/19/orig    -> origin/gh/SherlockNoMad/19/orig
2025-12-04T09:33:41.3847036Z  * [new branch]              gh/SherlockNoMad/2/base     -> origin/gh/SherlockNoMad/2/base
2025-12-04T09:33:41.3848247Z  * [new branch]              gh/SherlockNoMad/2/head     -> origin/gh/SherlockNoMad/2/head
2025-12-04T09:33:41.3849930Z  * [new branch]              gh/SherlockNoMad/20/base    -> origin/gh/SherlockNoMad/20/base
2025-12-04T09:33:41.3851225Z  * [new branch]              gh/SherlockNoMad/20/head    -> origin/gh/SherlockNoMad/20/head
2025-12-04T09:33:41.3852355Z  * [new branch]              gh/SherlockNoMad/20/orig    -> origin/gh/SherlockNoMad/20/orig
2025-12-04T09:33:41.3854430Z  * [new branch]              gh/SherlockNoMad/21/base    -> origin/gh/SherlockNoMad/21/base
2025-12-04T09:33:41.3855696Z  * [new branch]              gh/SherlockNoMad/21/head    -> origin/gh/SherlockNoMad/21/head
2025-12-04T09:33:41.3856909Z  * [new branch]              gh/SherlockNoMad/21/orig    -> origin/gh/SherlockNoMad/21/orig
2025-12-04T09:33:41.3858517Z  * [new branch]              gh/SherlockNoMad/3/base     -> origin/gh/SherlockNoMad/3/base
2025-12-04T09:33:41.3859694Z  * [new branch]              gh/SherlockNoMad/3/head     -> origin/gh/SherlockNoMad/3/head
2025-12-04T09:33:41.3861381Z  * [new branch]              gh/SherlockNoMad/4/base     -> origin/gh/SherlockNoMad/4/base
2025-12-04T09:33:41.3862513Z  * [new branch]              gh/SherlockNoMad/4/head     -> origin/gh/SherlockNoMad/4/head
2025-12-04T09:33:41.3864261Z  * [new branch]              gh/SherlockNoMad/5/base     -> origin/gh/SherlockNoMad/5/base
2025-12-04T09:33:41.3865414Z  * [new branch]              gh/SherlockNoMad/5/head     -> origin/gh/SherlockNoMad/5/head
2025-12-04T09:33:41.3868044Z  * [new branch]              gh/Sidharth123-cpu/24/base  -> origin/gh/Sidharth123-cpu/24/base
2025-12-04T09:33:41.3869606Z  * [new branch]              gh/Sidharth123-cpu/25/base  -> origin/gh/Sidharth123-cpu/25/base
2025-12-04T09:33:41.3871174Z  * [new branch]              gh/Sidharth123-cpu/26/base  -> origin/gh/Sidharth123-cpu/26/base
2025-12-04T09:33:41.3872995Z  * [new branch]              gh/Sidharth123-cpu/27/base  -> origin/gh/Sidharth123-cpu/27/base
2025-12-04T09:33:41.3875140Z  * [new branch]              gh/StrongerXi/1/base        -> origin/gh/StrongerXi/1/base
2025-12-04T09:33:41.3876243Z  * [new branch]              gh/StrongerXi/1/head        -> origin/gh/StrongerXi/1/head
2025-12-04T09:33:41.3878119Z  * [new branch]              gh/StrongerXi/71/base       -> origin/gh/StrongerXi/71/base
2025-12-04T09:33:41.3879334Z  * [new branch]              gh/StrongerXi/71/head       -> origin/gh/StrongerXi/71/head
2025-12-04T09:33:41.3880989Z  * [new branch]              gh/StrongerXi/72/base       -> origin/gh/StrongerXi/72/base
2025-12-04T09:33:41.3882255Z  * [new branch]              gh/StrongerXi/72/head       -> origin/gh/StrongerXi/72/head
2025-12-04T09:33:41.3884127Z  * [new branch]              gh/StrongerXi/73/base       -> origin/gh/StrongerXi/73/base
2025-12-04T09:33:41.3885267Z  * [new branch]              gh/StrongerXi/73/head       -> origin/gh/StrongerXi/73/head
2025-12-04T09:33:41.3886574Z  * [new branch]              gh/StrongerXi/73/orig       -> origin/gh/StrongerXi/73/orig
2025-12-04T09:33:41.3888896Z  * [new branch]              gh/XilunWu/160/base         -> origin/gh/XilunWu/160/base
2025-12-04T09:33:41.3890064Z  * [new branch]              gh/XilunWu/160/head         -> origin/gh/XilunWu/160/head
2025-12-04T09:33:41.3891381Z  * [new branch]              gh/XilunWu/160/orig         -> origin/gh/XilunWu/160/orig
2025-12-04T09:33:41.3893162Z  * [new branch]              gh/XilunWu/163/base         -> origin/gh/XilunWu/163/base
2025-12-04T09:33:41.3894623Z  * [new branch]              gh/XilunWu/163/head         -> origin/gh/XilunWu/163/head
2025-12-04T09:33:41.3895828Z  * [new branch]              gh/XilunWu/163/orig         -> origin/gh/XilunWu/163/orig
2025-12-04T09:33:41.3897802Z  * [new branch]              gh/XilunWu/168/base         -> origin/gh/XilunWu/168/base
2025-12-04T09:33:41.3898944Z  * [new branch]              gh/XilunWu/168/head         -> origin/gh/XilunWu/168/head
2025-12-04T09:33:41.3900370Z  * [new branch]              gh/XilunWu/168/orig         -> origin/gh/XilunWu/168/orig
2025-12-04T09:33:41.3902400Z  * [new branch]              gh/XilunWu/169/base         -> origin/gh/XilunWu/169/base
2025-12-04T09:33:41.3903659Z  * [new branch]              gh/XilunWu/169/head         -> origin/gh/XilunWu/169/head
2025-12-04T09:33:41.3904920Z  * [new branch]              gh/XilunWu/169/orig         -> origin/gh/XilunWu/169/orig
2025-12-04T09:33:41.3906544Z  * [new branch]              gh/XilunWu/170/base         -> origin/gh/XilunWu/170/base
2025-12-04T09:33:41.3907737Z  * [new branch]              gh/XilunWu/170/head         -> origin/gh/XilunWu/170/head
2025-12-04T09:33:41.3909170Z  * [new branch]              gh/XilunWu/170/orig         -> origin/gh/XilunWu/170/orig
2025-12-04T09:33:41.3911048Z  * [new branch]              gh/XilunWu/171/base         -> origin/gh/XilunWu/171/base
2025-12-04T09:33:41.3912205Z  * [new branch]              gh/XilunWu/171/head         -> origin/gh/XilunWu/171/head
2025-12-04T09:33:41.3913676Z  * [new branch]              gh/XilunWu/171/orig         -> origin/gh/XilunWu/171/orig
2025-12-04T09:33:41.3915298Z  * [new branch]              gh/XilunWu/173/base         -> origin/gh/XilunWu/173/base
2025-12-04T09:33:41.3916567Z  * [new branch]              gh/XilunWu/173/head         -> origin/gh/XilunWu/173/head
2025-12-04T09:33:41.3917884Z  * [new branch]              gh/XilunWu/173/orig         -> origin/gh/XilunWu/173/orig
2025-12-04T09:33:41.3919646Z  * [new branch]              gh/XilunWu/175/base         -> origin/gh/XilunWu/175/base
2025-12-04T09:33:41.3920892Z  * [new branch]              gh/XilunWu/175/head         -> origin/gh/XilunWu/175/head
2025-12-04T09:33:41.3922257Z  * [new branch]              gh/XilunWu/175/orig         -> origin/gh/XilunWu/175/orig
2025-12-04T09:33:41.3924170Z  * [new branch]              gh/XilunWu/176/base         -> origin/gh/XilunWu/176/base
2025-12-04T09:33:41.3925387Z  * [new branch]              gh/XilunWu/176/head         -> origin/gh/XilunWu/176/head
2025-12-04T09:33:41.3926813Z  * [new branch]              gh/XilunWu/176/orig         -> origin/gh/XilunWu/176/orig
2025-12-04T09:33:41.3928891Z  * [new branch]              gh/XuehaiPan/14/base        -> origin/gh/XuehaiPan/14/base
2025-12-04T09:33:41.3930175Z  * [new branch]              gh/XuehaiPan/14/head        -> origin/gh/XuehaiPan/14/head
2025-12-04T09:33:41.3931440Z  * [new branch]              gh/XuehaiPan/14/orig        -> origin/gh/XuehaiPan/14/orig
2025-12-04T09:33:41.3933279Z  * [new branch]              gh/XuehaiPan/179/base       -> origin/gh/XuehaiPan/179/base
2025-12-04T09:33:41.3934496Z  * [new branch]              gh/XuehaiPan/179/head       -> origin/gh/XuehaiPan/179/head
2025-12-04T09:33:41.3935973Z  * [new branch]              gh/XuehaiPan/179/orig       -> origin/gh/XuehaiPan/179/orig
2025-12-04T09:33:41.3937575Z  * [new branch]              gh/XuehaiPan/249/base       -> origin/gh/XuehaiPan/249/base
2025-12-04T09:33:41.3939107Z  * [new branch]              gh/XuehaiPan/249/head       -> origin/gh/XuehaiPan/249/head
2025-12-04T09:33:41.3940425Z  * [new branch]              gh/XuehaiPan/249/orig       -> origin/gh/XuehaiPan/249/orig
2025-12-04T09:33:41.3942240Z  * [new branch]              gh/XuehaiPan/253/base       -> origin/gh/XuehaiPan/253/base
2025-12-04T09:33:41.3943480Z  * [new branch]              gh/XuehaiPan/253/head       -> origin/gh/XuehaiPan/253/head
2025-12-04T09:33:41.3944734Z  * [new branch]              gh/XuehaiPan/253/orig       -> origin/gh/XuehaiPan/253/orig
2025-12-04T09:33:41.3946602Z  * [new branch]              gh/XuehaiPan/254/base       -> origin/gh/XuehaiPan/254/base
2025-12-04T09:33:41.3947867Z  * [new branch]              gh/XuehaiPan/254/head       -> origin/gh/XuehaiPan/254/head
2025-12-04T09:33:41.3949204Z  * [new branch]              gh/XuehaiPan/254/orig       -> origin/gh/XuehaiPan/254/orig
2025-12-04T09:33:41.3950882Z  * [new branch]              gh/XuehaiPan/255/base       -> origin/gh/XuehaiPan/255/base
2025-12-04T09:33:41.3952074Z  * [new branch]              gh/XuehaiPan/255/head       -> origin/gh/XuehaiPan/255/head
2025-12-04T09:33:41.3953377Z  * [new branch]              gh/XuehaiPan/255/orig       -> origin/gh/XuehaiPan/255/orig
2025-12-04T09:33:41.3955208Z  * [new branch]              gh/XuehaiPan/271/base       -> origin/gh/XuehaiPan/271/base
2025-12-04T09:33:41.3956413Z  * [new branch]              gh/XuehaiPan/271/head       -> origin/gh/XuehaiPan/271/head
2025-12-04T09:33:41.3957669Z  * [new branch]              gh/XuehaiPan/271/orig       -> origin/gh/XuehaiPan/271/orig
2025-12-04T09:33:41.3959440Z  * [new branch]              gh/XuehaiPan/343/base       -> origin/gh/XuehaiPan/343/base
2025-12-04T09:33:41.3960639Z  * [new branch]              gh/XuehaiPan/343/head       -> origin/gh/XuehaiPan/343/head
2025-12-04T09:33:41.3961898Z  * [new branch]              gh/XuehaiPan/343/orig       -> origin/gh/XuehaiPan/343/orig
2025-12-04T09:33:41.3963868Z  * [new branch]              gh/XuehaiPan/347/base       -> origin/gh/XuehaiPan/347/base
2025-12-04T09:33:41.3965155Z  * [new branch]              gh/XuehaiPan/347/head       -> origin/gh/XuehaiPan/347/head
2025-12-04T09:33:41.3966473Z  * [new branch]              gh/XuehaiPan/347/orig       -> origin/gh/XuehaiPan/347/orig
2025-12-04T09:33:41.3968320Z  * [new branch]              gh/XuehaiPan/348/base       -> origin/gh/XuehaiPan/348/base
2025-12-04T09:33:41.3969493Z  * [new branch]              gh/XuehaiPan/348/head       -> origin/gh/XuehaiPan/348/head
2025-12-04T09:33:41.3970775Z  * [new branch]              gh/XuehaiPan/348/orig       -> origin/gh/XuehaiPan/348/orig
2025-12-04T09:33:41.3972556Z  * [new branch]              gh/XuehaiPan/350/base       -> origin/gh/XuehaiPan/350/base
2025-12-04T09:33:41.3973806Z  * [new branch]              gh/XuehaiPan/350/head       -> origin/gh/XuehaiPan/350/head
2025-12-04T09:33:41.3975039Z  * [new branch]              gh/XuehaiPan/350/orig       -> origin/gh/XuehaiPan/350/orig
2025-12-04T09:33:41.3976974Z  * [new branch]              gh/XuehaiPan/365/base       -> origin/gh/XuehaiPan/365/base
2025-12-04T09:33:41.3978087Z  * [new branch]              gh/XuehaiPan/365/head       -> origin/gh/XuehaiPan/365/head
2025-12-04T09:33:41.3979358Z  * [new branch]              gh/XuehaiPan/365/orig       -> origin/gh/XuehaiPan/365/orig
2025-12-04T09:33:41.3981211Z  * [new branch]              gh/XuehaiPan/366/base       -> origin/gh/XuehaiPan/366/base
2025-12-04T09:33:41.3982398Z  * [new branch]              gh/XuehaiPan/366/head       -> origin/gh/XuehaiPan/366/head
2025-12-04T09:33:41.3984661Z  * [new branch]              gh/XuehaiPan/370/base       -> origin/gh/XuehaiPan/370/base
2025-12-04T09:33:41.3985888Z  * [new branch]              gh/XuehaiPan/370/head       -> origin/gh/XuehaiPan/370/head
2025-12-04T09:33:41.3987396Z  * [new branch]              gh/XuehaiPan/370/orig       -> origin/gh/XuehaiPan/370/orig
2025-12-04T09:33:41.3989031Z  * [new branch]              gh/XuehaiPan/390/base       -> origin/gh/XuehaiPan/390/base
2025-12-04T09:33:41.3990233Z  * [new branch]              gh/XuehaiPan/390/head       -> origin/gh/XuehaiPan/390/head
2025-12-04T09:33:41.3991517Z  * [new branch]              gh/XuehaiPan/390/orig       -> origin/gh/XuehaiPan/390/orig
2025-12-04T09:33:41.3993292Z  * [new branch]              gh/XuehaiPan/391/base       -> origin/gh/XuehaiPan/391/base
2025-12-04T09:33:41.3994494Z  * [new branch]              gh/XuehaiPan/391/head       -> origin/gh/XuehaiPan/391/head
2025-12-04T09:33:41.3995735Z  * [new branch]              gh/XuehaiPan/391/orig       -> origin/gh/XuehaiPan/391/orig
2025-12-04T09:33:41.3997543Z  * [new branch]              gh/XuehaiPan/392/base       -> origin/gh/XuehaiPan/392/base
2025-12-04T09:33:41.3998726Z  * [new branch]              gh/XuehaiPan/392/head       -> origin/gh/XuehaiPan/392/head
2025-12-04T09:33:41.4000062Z  * [new branch]              gh/XuehaiPan/392/orig       -> origin/gh/XuehaiPan/392/orig
2025-12-04T09:33:41.4005124Z  * [new branch]              gh/XuehaiPan/394/base       -> origin/gh/XuehaiPan/394/base
2025-12-04T09:33:41.4006362Z  * [new branch]              gh/XuehaiPan/394/head       -> origin/gh/XuehaiPan/394/head
2025-12-04T09:33:41.4007675Z  * [new branch]              gh/XuehaiPan/394/orig       -> origin/gh/XuehaiPan/394/orig
2025-12-04T09:33:41.4009498Z  * [new branch]              gh/XuehaiPan/397/base       -> origin/gh/XuehaiPan/397/base
2025-12-04T09:33:41.4010746Z  * [new branch]              gh/XuehaiPan/397/head       -> origin/gh/XuehaiPan/397/head
2025-12-04T09:33:41.4011988Z  * [new branch]              gh/XuehaiPan/397/orig       -> origin/gh/XuehaiPan/397/orig
2025-12-04T09:33:41.4013849Z  * [new branch]              gh/XuehaiPan/398/base       -> origin/gh/XuehaiPan/398/base
2025-12-04T09:33:41.4015082Z  * [new branch]              gh/XuehaiPan/398/head       -> origin/gh/XuehaiPan/398/head
2025-12-04T09:33:41.4016340Z  * [new branch]              gh/XuehaiPan/398/orig       -> origin/gh/XuehaiPan/398/orig
2025-12-04T09:33:41.4018105Z  * [new branch]              gh/XuehaiPan/399/base       -> origin/gh/XuehaiPan/399/base
2025-12-04T09:33:41.4019323Z  * [new branch]              gh/XuehaiPan/399/head       -> origin/gh/XuehaiPan/399/head
2025-12-04T09:33:41.4020632Z  * [new branch]              gh/XuehaiPan/399/orig       -> origin/gh/XuehaiPan/399/orig
2025-12-04T09:33:41.4022577Z  * [new branch]              gh/XuehaiPan/400/base       -> origin/gh/XuehaiPan/400/base
2025-12-04T09:33:41.4023761Z  * [new branch]              gh/XuehaiPan/400/head       -> origin/gh/XuehaiPan/400/head
2025-12-04T09:33:41.4025042Z  * [new branch]              gh/XuehaiPan/400/orig       -> origin/gh/XuehaiPan/400/orig
2025-12-04T09:33:41.4027168Z  * [new branch]              gh/ZhiweiYan-96/39/base     -> origin/gh/ZhiweiYan-96/39/base
2025-12-04T09:33:41.4028397Z  * [new branch]              gh/ZhiweiYan-96/39/head     -> origin/gh/ZhiweiYan-96/39/head
2025-12-04T09:33:41.4029696Z  * [new branch]              gh/ZhiweiYan-96/39/orig     -> origin/gh/ZhiweiYan-96/39/orig
2025-12-04T09:33:41.4031667Z  * [new branch]              gh/ZhiweiYan-96/44/base     -> origin/gh/ZhiweiYan-96/44/base
2025-12-04T09:33:41.4032833Z  * [new branch]              gh/ZhiweiYan-96/44/head     -> origin/gh/ZhiweiYan-96/44/head
2025-12-04T09:33:41.4034549Z  * [new branch]              gh/ZhiweiYan-96/45/base     -> origin/gh/ZhiweiYan-96/45/base
2025-12-04T09:33:41.4035688Z  * [new branch]              gh/ZhiweiYan-96/45/head     -> origin/gh/ZhiweiYan-96/45/head
2025-12-04T09:33:41.4037631Z  * [new branch]              gh/ZhiweiYan-96/49/base     -> origin/gh/ZhiweiYan-96/49/base
2025-12-04T09:33:41.4038854Z  * [new branch]              gh/ZhiweiYan-96/49/head     -> origin/gh/ZhiweiYan-96/49/head
2025-12-04T09:33:41.4040635Z  * [new branch]              gh/ZhiweiYan-96/62/base     -> origin/gh/ZhiweiYan-96/62/base
2025-12-04T09:33:41.4041826Z  * [new branch]              gh/ZhiweiYan-96/62/head     -> origin/gh/ZhiweiYan-96/62/head
2025-12-04T09:33:41.4043850Z  * [new branch]              gh/ZhiweiYan-96/66/base     -> origin/gh/ZhiweiYan-96/66/base
2025-12-04T09:33:41.4045071Z  * [new branch]              gh/ZhiweiYan-96/66/head     -> origin/gh/ZhiweiYan-96/66/head
2025-12-04T09:33:41.4046807Z  * [new branch]              gh/ZhiweiYan-96/67/base     -> origin/gh/ZhiweiYan-96/67/base
2025-12-04T09:33:41.4047952Z  * [new branch]              gh/ZhiweiYan-96/67/head     -> origin/gh/ZhiweiYan-96/67/head
2025-12-04T09:33:41.4049739Z  * [new branch]              gh/ZhiweiYan-96/68/base     -> origin/gh/ZhiweiYan-96/68/base
2025-12-04T09:33:41.4050807Z  * [new branch]              gh/ZhiweiYan-96/68/head     -> origin/gh/ZhiweiYan-96/68/head
2025-12-04T09:33:41.4052103Z  * [new branch]              gh/ZhiweiYan-96/68/orig     -> origin/gh/ZhiweiYan-96/68/orig
2025-12-04T09:33:41.4054445Z  * [new branch]              gh/aakhundov/1/base         -> origin/gh/aakhundov/1/base
2025-12-04T09:33:41.4055710Z  * [new branch]              gh/aakhundov/1/head         -> origin/gh/aakhundov/1/head
2025-12-04T09:33:41.4057430Z  * [new branch]              gh/aakhundov/2/base         -> origin/gh/aakhundov/2/base
2025-12-04T09:33:41.4058658Z  * [new branch]              gh/aakhundov/2/head         -> origin/gh/aakhundov/2/head
2025-12-04T09:33:41.4060528Z  * [new branch]              gh/aditew01/openblas        -> origin/gh/aditew01/openblas
2025-12-04T09:33:41.4061656Z  * [new branch]              gh/aditew01/sbgemm          -> origin/gh/aditew01/sbgemm
2025-12-04T09:33:41.4062950Z  * [new branch]              gh/aditew01/vecbf16         -> origin/gh/aditew01/vecbf16
2025-12-04T09:33:41.4065007Z  * [new branch]              gh/albanD/4/base            -> origin/gh/albanD/4/base
2025-12-04T09:33:41.4066180Z  * [new branch]              gh/albanD/4/head            -> origin/gh/albanD/4/head
2025-12-04T09:33:41.4067561Z  * [new branch]              gh/albanD/4/orig            -> origin/gh/albanD/4/orig
2025-12-04T09:33:41.4069782Z  * [new branch]              gh/alexbrauckmann/paddedtensor_faketensor_init -> origin/gh/alexbrauckmann/paddedtensor_faketensor_init
2025-12-04T09:33:41.4071611Z  * [new branch]              gh/alexsamardzic/12/base    -> origin/gh/alexsamardzic/12/base
2025-12-04T09:33:41.4072840Z  * [new branch]              gh/alexsamardzic/12/head    -> origin/gh/alexsamardzic/12/head
2025-12-04T09:33:41.4074166Z  * [new branch]              gh/alexsamardzic/12/orig    -> origin/gh/alexsamardzic/12/orig
2025-12-04T09:33:41.4075989Z  * [new branch]              gh/alexsamardzic/14/base    -> origin/gh/alexsamardzic/14/base
2025-12-04T09:33:41.4077150Z  * [new branch]              gh/alexsamardzic/14/head    -> origin/gh/alexsamardzic/14/head
2025-12-04T09:33:41.4078501Z  * [new branch]              gh/alexsamardzic/14/orig    -> origin/gh/alexsamardzic/14/orig
2025-12-04T09:33:41.4080305Z  * [new branch]              gh/alexsamardzic/15/base    -> origin/gh/alexsamardzic/15/base
2025-12-04T09:33:41.4081533Z  * [new branch]              gh/alexsamardzic/15/head    -> origin/gh/alexsamardzic/15/head
2025-12-04T09:33:41.4083027Z  * [new branch]              gh/alexsamardzic/15/orig    -> origin/gh/alexsamardzic/15/orig
2025-12-04T09:33:41.4085278Z  * [new branch]              gh/amjames/18/base          -> origin/gh/amjames/18/base
2025-12-04T09:33:41.4086449Z  * [new branch]              gh/amjames/18/head          -> origin/gh/amjames/18/head
2025-12-04T09:33:41.4087746Z  * [new branch]              gh/amjames/18/orig          -> origin/gh/amjames/18/orig
2025-12-04T09:33:41.4090109Z  * [new branch]              gh/andrewor14/35/base       -> origin/gh/andrewor14/35/base
2025-12-04T09:33:41.4091409Z  * [new branch]              gh/andrewor14/35/head       -> origin/gh/andrewor14/35/head
2025-12-04T09:33:41.4092864Z  * [new branch]              gh/andrewor14/35/orig       -> origin/gh/andrewor14/35/orig
2025-12-04T09:33:41.4094779Z  * [new branch]              gh/andrewor14/50/base       -> origin/gh/andrewor14/50/base
2025-12-04T09:33:41.4096096Z  * [new branch]              gh/andrewor14/50/head       -> origin/gh/andrewor14/50/head
2025-12-04T09:33:41.4097567Z  * [new branch]              gh/andrewor14/50/orig       -> origin/gh/andrewor14/50/orig
2025-12-04T09:33:41.4099748Z  * [new branch]              gh/andyanwang/30/base       -> origin/gh/andyanwang/30/base
2025-12-04T09:33:41.4101499Z  * [new branch]              gh/andyanwang/30/orig       -> origin/gh/andyanwang/30/orig
2025-12-04T09:33:41.4103513Z  * [new branch]              gh/andyanwang/31/base       -> origin/gh/andyanwang/31/base
2025-12-04T09:33:41.4105010Z  * [new branch]              gh/andyanwang/31/orig       -> origin/gh/andyanwang/31/orig
2025-12-04T09:33:41.4106814Z  * [new branch]              gh/andyanwang/39/base       -> origin/gh/andyanwang/39/base
2025-12-04T09:33:41.4108095Z  * [new branch]              gh/andyanwang/39/head       -> origin/gh/andyanwang/39/head
2025-12-04T09:33:41.4109417Z  * [new branch]              gh/andyanwang/39/orig       -> origin/gh/andyanwang/39/orig
2025-12-04T09:33:41.4111470Z  * [new branch]              gh/andyanwang/42/base       -> origin/gh/andyanwang/42/base
2025-12-04T09:33:41.4112651Z  * [new branch]              gh/andyanwang/42/head       -> origin/gh/andyanwang/42/head
2025-12-04T09:33:41.4113913Z  * [new branch]              gh/andyanwang/42/orig       -> origin/gh/andyanwang/42/orig
2025-12-04T09:33:41.4115838Z  * [new branch]              gh/andyanwang/45/base       -> origin/gh/andyanwang/45/base
2025-12-04T09:33:41.4117147Z  * [new branch]              gh/andyanwang/45/head       -> origin/gh/andyanwang/45/head
2025-12-04T09:33:41.4118452Z  * [new branch]              gh/andyanwang/45/orig       -> origin/gh/andyanwang/45/orig
2025-12-04T09:33:41.4120693Z  * [new branch]              gh/angelayi/107/base        -> origin/gh/angelayi/107/base
2025-12-04T09:33:41.4121854Z  * [new branch]              gh/angelayi/107/head        -> origin/gh/angelayi/107/head
2025-12-04T09:33:41.4123808Z  * [new branch]              gh/angelayi/114/base        -> origin/gh/angelayi/114/base
2025-12-04T09:33:41.4125119Z  * [new branch]              gh/angelayi/114/head        -> origin/gh/angelayi/114/head
2025-12-04T09:33:41.4126392Z  * [new branch]              gh/angelayi/114/orig        -> origin/gh/angelayi/114/orig
2025-12-04T09:33:41.4128128Z  * [new branch]              gh/angelayi/116/base        -> origin/gh/angelayi/116/base
2025-12-04T09:33:41.4129306Z  * [new branch]              gh/angelayi/116/head        -> origin/gh/angelayi/116/head
2025-12-04T09:33:41.4130728Z  * [new branch]              gh/angelayi/116/orig        -> origin/gh/angelayi/116/orig
2025-12-04T09:33:41.4132568Z  * [new branch]              gh/angelayi/122/base        -> origin/gh/angelayi/122/base
2025-12-04T09:33:41.4133703Z  * [new branch]              gh/angelayi/122/head        -> origin/gh/angelayi/122/head
2025-12-04T09:33:41.4134963Z  * [new branch]              gh/angelayi/122/orig        -> origin/gh/angelayi/122/orig
2025-12-04T09:33:41.4137042Z  * [new branch]              gh/angelayi/124/base        -> origin/gh/angelayi/124/base
2025-12-04T09:33:41.4138322Z  * [new branch]              gh/angelayi/124/head        -> origin/gh/angelayi/124/head
2025-12-04T09:33:41.4139484Z  * [new branch]              gh/angelayi/124/orig        -> origin/gh/angelayi/124/orig
2025-12-04T09:33:41.4141355Z  * [new branch]              gh/angelayi/128/base        -> origin/gh/angelayi/128/base
2025-12-04T09:33:41.4142537Z  * [new branch]              gh/angelayi/128/head        -> origin/gh/angelayi/128/head
2025-12-04T09:33:41.4143807Z  * [new branch]              gh/angelayi/128/orig        -> origin/gh/angelayi/128/orig
2025-12-04T09:33:41.4145696Z  * [new branch]              gh/angelayi/131/base        -> origin/gh/angelayi/131/base
2025-12-04T09:33:41.4146877Z  * [new branch]              gh/angelayi/131/head        -> origin/gh/angelayi/131/head
2025-12-04T09:33:41.4148152Z  * [new branch]              gh/angelayi/131/orig        -> origin/gh/angelayi/131/orig
2025-12-04T09:33:41.4150315Z  * [new branch]              gh/angelayi/132/base        -> origin/gh/angelayi/132/base
2025-12-04T09:33:41.4151784Z  * [new branch]              gh/angelayi/132/head        -> origin/gh/angelayi/132/head
2025-12-04T09:33:41.4153217Z  * [new branch]              gh/angelayi/132/orig        -> origin/gh/angelayi/132/orig
2025-12-04T09:33:41.4154959Z  * [new branch]              gh/angelayi/133/base        -> origin/gh/angelayi/133/base
2025-12-04T09:33:41.4156214Z  * [new branch]              gh/angelayi/133/head        -> origin/gh/angelayi/133/head
2025-12-04T09:33:41.4157481Z  * [new branch]              gh/angelayi/133/orig        -> origin/gh/angelayi/133/orig
2025-12-04T09:33:41.4159613Z  * [new branch]              gh/angelayi/134/base        -> origin/gh/angelayi/134/base
2025-12-04T09:33:41.4161041Z  * [new branch]              gh/angelayi/134/head        -> origin/gh/angelayi/134/head
2025-12-04T09:33:41.4162329Z  * [new branch]              gh/angelayi/134/orig        -> origin/gh/angelayi/134/orig
2025-12-04T09:33:41.4164474Z  * [new branch]              gh/angelayi/135/base        -> origin/gh/angelayi/135/base
2025-12-04T09:33:41.4165741Z  * [new branch]              gh/angelayi/135/head        -> origin/gh/angelayi/135/head
2025-12-04T09:33:41.4167036Z  * [new branch]              gh/angelayi/135/orig        -> origin/gh/angelayi/135/orig
2025-12-04T09:33:41.4168777Z  * [new branch]              gh/angelayi/136/base        -> origin/gh/angelayi/136/base
2025-12-04T09:33:41.4170028Z  * [new branch]              gh/angelayi/136/head        -> origin/gh/angelayi/136/head
2025-12-04T09:33:41.4171287Z  * [new branch]              gh/angelayi/136/orig        -> origin/gh/angelayi/136/orig
2025-12-04T09:33:41.4173184Z  * [new branch]              gh/angelayi/137/base        -> origin/gh/angelayi/137/base
2025-12-04T09:33:41.4174316Z  * [new branch]              gh/angelayi/137/head        -> origin/gh/angelayi/137/head
2025-12-04T09:33:41.4175870Z  * [new branch]              gh/angelayi/137/orig        -> origin/gh/angelayi/137/orig
2025-12-04T09:33:41.4177499Z  * [new branch]              gh/angelayi/138/base        -> origin/gh/angelayi/138/base
2025-12-04T09:33:41.4178636Z  * [new branch]              gh/angelayi/138/head        -> origin/gh/angelayi/138/head
2025-12-04T09:33:41.4180120Z  * [new branch]              gh/angelayi/138/orig        -> origin/gh/angelayi/138/orig
2025-12-04T09:33:41.4181828Z  * [new branch]              gh/angelayi/139/base        -> origin/gh/angelayi/139/base
2025-12-04T09:33:41.4183094Z  * [new branch]              gh/angelayi/139/head        -> origin/gh/angelayi/139/head
2025-12-04T09:33:41.4184364Z  * [new branch]              gh/angelayi/139/orig        -> origin/gh/angelayi/139/orig
2025-12-04T09:33:41.4186200Z  * [new branch]              gh/angelayi/140/base        -> origin/gh/angelayi/140/base
2025-12-04T09:33:41.4187522Z  * [new branch]              gh/angelayi/140/head        -> origin/gh/angelayi/140/head
2025-12-04T09:33:41.4188821Z  * [new branch]              gh/angelayi/140/orig        -> origin/gh/angelayi/140/orig
2025-12-04T09:33:41.4191653Z  * [new branch]              gh/angelayi/141/base        -> origin/gh/angelayi/141/base
2025-12-04T09:33:41.4192502Z  * [new branch]              gh/angelayi/141/head        -> origin/gh/angelayi/141/head
2025-12-04T09:33:41.4193764Z  * [new branch]              gh/angelayi/141/orig        -> origin/gh/angelayi/141/orig
2025-12-04T09:33:41.4195615Z  * [new branch]              gh/angelayi/142/base        -> origin/gh/angelayi/142/base
2025-12-04T09:33:41.4196831Z  * [new branch]              gh/angelayi/142/head        -> origin/gh/angelayi/142/head
2025-12-04T09:33:41.4198125Z  * [new branch]              gh/angelayi/142/orig        -> origin/gh/angelayi/142/orig
2025-12-04T09:33:41.4199892Z  * [new branch]              gh/angelayi/143/base        -> origin/gh/angelayi/143/base
2025-12-04T09:33:41.4201236Z  * [new branch]              gh/angelayi/143/head        -> origin/gh/angelayi/143/head
2025-12-04T09:33:41.4202852Z  * [new branch]              gh/angelayi/143/orig        -> origin/gh/angelayi/143/orig
2025-12-04T09:33:41.4204658Z  * [new branch]              gh/angelayi/144/base        -> origin/gh/angelayi/144/base
2025-12-04T09:33:41.4206099Z  * [new branch]              gh/angelayi/144/head        -> origin/gh/angelayi/144/head
2025-12-04T09:33:41.4207300Z  * [new branch]              gh/angelayi/144/orig        -> origin/gh/angelayi/144/orig
2025-12-04T09:33:41.4209778Z  * [new branch]              gh/anijain2305/753/base     -> origin/gh/anijain2305/753/base
2025-12-04T09:33:41.4210970Z  * [new branch]              gh/anijain2305/753/head     -> origin/gh/anijain2305/753/head
2025-12-04T09:33:41.4212216Z  * [new branch]              gh/anijain2305/753/orig     -> origin/gh/anijain2305/753/orig
2025-12-04T09:33:41.4214160Z  * [new branch]              gh/anijain2305/810/base     -> origin/gh/anijain2305/810/base
2025-12-04T09:33:41.4215414Z  * [new branch]              gh/anijain2305/810/head     -> origin/gh/anijain2305/810/head
2025-12-04T09:33:41.4217076Z  * [new branch]              gh/anijain2305/810/orig     -> origin/gh/anijain2305/810/orig
2025-12-04T09:33:41.4218537Z  * [new branch]              gh/anijain2305/854/base     -> origin/gh/anijain2305/854/base
2025-12-04T09:33:41.4220105Z  * [new branch]              gh/anijain2305/854/head     -> origin/gh/anijain2305/854/head
2025-12-04T09:33:41.4221289Z  * [new branch]              gh/anijain2305/854/orig     -> origin/gh/anijain2305/854/orig
2025-12-04T09:33:41.4223212Z  * [new branch]              gh/anijain2305/864/base     -> origin/gh/anijain2305/864/base
2025-12-04T09:33:41.4224413Z  * [new branch]              gh/anijain2305/864/head     -> origin/gh/anijain2305/864/head
2025-12-04T09:33:41.4225672Z  * [new branch]              gh/anijain2305/864/orig     -> origin/gh/anijain2305/864/orig
2025-12-04T09:33:41.4227669Z  * [new branch]              gh/anijain2305/870/base     -> origin/gh/anijain2305/870/base
2025-12-04T09:33:41.4228799Z  * [new branch]              gh/anijain2305/870/head     -> origin/gh/anijain2305/870/head
2025-12-04T09:33:41.4230116Z  * [new branch]              gh/anijain2305/870/orig     -> origin/gh/anijain2305/870/orig
2025-12-04T09:33:41.4232022Z  * [new branch]              gh/anijain2305/873/base     -> origin/gh/anijain2305/873/base
2025-12-04T09:33:41.4233158Z  * [new branch]              gh/anijain2305/873/head     -> origin/gh/anijain2305/873/head
2025-12-04T09:33:41.4234431Z  * [new branch]              gh/anijain2305/873/orig     -> origin/gh/anijain2305/873/orig
2025-12-04T09:33:41.4236267Z  * [new branch]              gh/anijain2305/894/base     -> origin/gh/anijain2305/894/base
2025-12-04T09:33:41.4237449Z  * [new branch]              gh/anijain2305/894/head     -> origin/gh/anijain2305/894/head
2025-12-04T09:33:41.4238780Z  * [new branch]              gh/anijain2305/894/orig     -> origin/gh/anijain2305/894/orig
2025-12-04T09:33:41.4240646Z  * [new branch]              gh/anijain2305/895/base     -> origin/gh/anijain2305/895/base
2025-12-04T09:33:41.4241877Z  * [new branch]              gh/anijain2305/895/head     -> origin/gh/anijain2305/895/head
2025-12-04T09:33:41.4243486Z  * [new branch]              gh/anijain2305/895/orig     -> origin/gh/anijain2305/895/orig
2025-12-04T09:33:41.4245219Z  * [new branch]              gh/anijain2305/910/base     -> origin/gh/anijain2305/910/base
2025-12-04T09:33:41.4246435Z  * [new branch]              gh/anijain2305/910/head     -> origin/gh/anijain2305/910/head
2025-12-04T09:33:41.4247757Z  * [new branch]              gh/anijain2305/910/orig     -> origin/gh/anijain2305/910/orig
2025-12-04T09:33:41.4249666Z  * [new branch]              gh/anijain2305/919/base     -> origin/gh/anijain2305/919/base
2025-12-04T09:33:41.4250940Z  * [new branch]              gh/anijain2305/919/head     -> origin/gh/anijain2305/919/head
2025-12-04T09:33:41.4252236Z  * [new branch]              gh/anijain2305/919/orig     -> origin/gh/anijain2305/919/orig
2025-12-04T09:33:41.4254040Z  * [new branch]              gh/anijain2305/922/base     -> origin/gh/anijain2305/922/base
2025-12-04T09:33:41.4255387Z  * [new branch]              gh/anijain2305/922/head     -> origin/gh/anijain2305/922/head
2025-12-04T09:33:41.4256646Z  * [new branch]              gh/anijain2305/922/orig     -> origin/gh/anijain2305/922/orig
2025-12-04T09:33:41.4258485Z  * [new branch]              gh/anijain2305/932/base     -> origin/gh/anijain2305/932/base
2025-12-04T09:33:41.4259841Z  * [new branch]              gh/anijain2305/932/head     -> origin/gh/anijain2305/932/head
2025-12-04T09:33:41.4261161Z  * [new branch]              gh/anijain2305/932/orig     -> origin/gh/anijain2305/932/orig
2025-12-04T09:33:41.4263048Z  * [new branch]              gh/anijain2305/940/base     -> origin/gh/anijain2305/940/base
2025-12-04T09:33:41.4264228Z  * [new branch]              gh/anijain2305/940/head     -> origin/gh/anijain2305/940/head
2025-12-04T09:33:41.4265492Z  * [new branch]              gh/anijain2305/940/orig     -> origin/gh/anijain2305/940/orig
2025-12-04T09:33:41.4267322Z  * [new branch]              gh/anijain2305/941/base     -> origin/gh/anijain2305/941/base
2025-12-04T09:33:41.4268610Z  * [new branch]              gh/anijain2305/941/head     -> origin/gh/anijain2305/941/head
2025-12-04T09:33:41.4269847Z  * [new branch]              gh/anijain2305/941/orig     -> origin/gh/anijain2305/941/orig
2025-12-04T09:33:41.4271666Z  * [new branch]              gh/anijain2305/942/base     -> origin/gh/anijain2305/942/base
2025-12-04T09:33:41.4272940Z  * [new branch]              gh/anijain2305/942/head     -> origin/gh/anijain2305/942/head
2025-12-04T09:33:41.4274400Z  * [new branch]              gh/anijain2305/942/orig     -> origin/gh/anijain2305/942/orig
2025-12-04T09:33:41.4276136Z  * [new branch]              gh/anijain2305/943/base     -> origin/gh/anijain2305/943/base
2025-12-04T09:33:41.4277302Z  * [new branch]              gh/anijain2305/943/head     -> origin/gh/anijain2305/943/head
2025-12-04T09:33:41.4278637Z  * [new branch]              gh/anijain2305/943/orig     -> origin/gh/anijain2305/943/orig
2025-12-04T09:33:41.4281138Z  * [new branch]              gh/anijain2305/944/base     -> origin/gh/anijain2305/944/base
2025-12-04T09:33:41.4282455Z  * [new branch]              gh/anijain2305/944/head     -> origin/gh/anijain2305/944/head
2025-12-04T09:33:41.4284666Z  * [new branch]              gh/anijain2305/944/orig     -> origin/gh/anijain2305/944/orig
2025-12-04T09:33:41.4286507Z  * [new branch]              gh/anijain2305/945/base     -> origin/gh/anijain2305/945/base
2025-12-04T09:33:41.4287804Z  * [new branch]              gh/anijain2305/945/head     -> origin/gh/anijain2305/945/head
2025-12-04T09:33:41.4289091Z  * [new branch]              gh/anijain2305/945/orig     -> origin/gh/anijain2305/945/orig
2025-12-04T09:33:41.4291004Z  * [new branch]              gh/anijain2305/946/base     -> origin/gh/anijain2305/946/base
2025-12-04T09:33:41.4292204Z  * [new branch]              gh/anijain2305/946/head     -> origin/gh/anijain2305/946/head
2025-12-04T09:33:41.4293465Z  * [new branch]              gh/anijain2305/946/orig     -> origin/gh/anijain2305/946/orig
2025-12-04T09:33:41.4295468Z  * [new branch]              gh/anijain2305/947/base     -> origin/gh/anijain2305/947/base
2025-12-04T09:33:41.4296496Z  * [new branch]              gh/anijain2305/947/head     -> origin/gh/anijain2305/947/head
2025-12-04T09:33:41.4297789Z  * [new branch]              gh/anijain2305/947/orig     -> origin/gh/anijain2305/947/orig
2025-12-04T09:33:41.4299984Z  * [new branch]              gh/anijain2305/948/base     -> origin/gh/anijain2305/948/base
2025-12-04T09:33:41.4301147Z  * [new branch]              gh/anijain2305/948/head     -> origin/gh/anijain2305/948/head
2025-12-04T09:33:41.4302471Z  * [new branch]              gh/anijain2305/948/orig     -> origin/gh/anijain2305/948/orig
2025-12-04T09:33:41.4304298Z  * [new branch]              gh/anijain2305/949/base     -> origin/gh/anijain2305/949/base
2025-12-04T09:33:41.4305484Z  * [new branch]              gh/anijain2305/949/head     -> origin/gh/anijain2305/949/head
2025-12-04T09:33:41.4306762Z  * [new branch]              gh/anijain2305/949/orig     -> origin/gh/anijain2305/949/orig
2025-12-04T09:33:41.4308649Z  * [new branch]              gh/anijain2305/950/base     -> origin/gh/anijain2305/950/base
2025-12-04T09:33:41.4309879Z  * [new branch]              gh/anijain2305/950/head     -> origin/gh/anijain2305/950/head
2025-12-04T09:33:41.4311430Z  * [new branch]              gh/anijain2305/950/orig     -> origin/gh/anijain2305/950/orig
2025-12-04T09:33:41.4313241Z  * [new branch]              gh/anijain2305/951/base     -> origin/gh/anijain2305/951/base
2025-12-04T09:33:41.4314457Z  * [new branch]              gh/anijain2305/951/head     -> origin/gh/anijain2305/951/head
2025-12-04T09:33:41.4315796Z  * [new branch]              gh/anijain2305/951/orig     -> origin/gh/anijain2305/951/orig
2025-12-04T09:33:41.4317733Z  * [new branch]              gh/anijain2305/952/base     -> origin/gh/anijain2305/952/base
2025-12-04T09:33:41.4318981Z  * [new branch]              gh/anijain2305/952/head     -> origin/gh/anijain2305/952/head
2025-12-04T09:33:41.4320260Z  * [new branch]              gh/anijain2305/952/orig     -> origin/gh/anijain2305/952/orig
2025-12-04T09:33:41.4322093Z  * [new branch]              gh/anijain2305/953/base     -> origin/gh/anijain2305/953/base
2025-12-04T09:33:41.4323407Z  * [new branch]              gh/anijain2305/953/head     -> origin/gh/anijain2305/953/head
2025-12-04T09:33:41.4324660Z  * [new branch]              gh/anijain2305/953/orig     -> origin/gh/anijain2305/953/orig
2025-12-04T09:33:41.4326545Z  * [new branch]              gh/anijain2305/954/base     -> origin/gh/anijain2305/954/base
2025-12-04T09:33:41.4327835Z  * [new branch]              gh/anijain2305/954/head     -> origin/gh/anijain2305/954/head
2025-12-04T09:33:41.4329126Z  * [new branch]              gh/anijain2305/954/orig     -> origin/gh/anijain2305/954/orig
2025-12-04T09:33:41.4331019Z  * [new branch]              gh/anijain2305/955/base     -> origin/gh/anijain2305/955/base
2025-12-04T09:33:41.4332332Z  * [new branch]              gh/anijain2305/955/head     -> origin/gh/anijain2305/955/head
2025-12-04T09:33:41.4333580Z  * [new branch]              gh/anijain2305/955/orig     -> origin/gh/anijain2305/955/orig
2025-12-04T09:33:41.4335606Z  * [new branch]              gh/anijain2305/956/base     -> origin/gh/anijain2305/956/base
2025-12-04T09:33:41.4337142Z  * [new branch]              gh/anijain2305/956/head     -> origin/gh/anijain2305/956/head
2025-12-04T09:33:41.4338101Z  * [new branch]              gh/anijain2305/956/orig     -> origin/gh/anijain2305/956/orig
2025-12-04T09:33:41.4340068Z  * [new branch]              gh/anijain2305/957/base     -> origin/gh/anijain2305/957/base
2025-12-04T09:33:41.4341321Z  * [new branch]              gh/anijain2305/957/head     -> origin/gh/anijain2305/957/head
2025-12-04T09:33:41.4342611Z  * [new branch]              gh/anijain2305/957/orig     -> origin/gh/anijain2305/957/orig
2025-12-04T09:33:41.4344421Z  * [new branch]              gh/anijain2305/958/base     -> origin/gh/anijain2305/958/base
2025-12-04T09:33:41.4345849Z  * [new branch]              gh/anijain2305/958/head     -> origin/gh/anijain2305/958/head
2025-12-04T09:33:41.4347047Z  * [new branch]              gh/anijain2305/958/orig     -> origin/gh/anijain2305/958/orig
2025-12-04T09:33:41.4348888Z  * [new branch]              gh/anijain2305/959/base     -> origin/gh/anijain2305/959/base
2025-12-04T09:33:41.4350098Z  * [new branch]              gh/anijain2305/959/head     -> origin/gh/anijain2305/959/head
2025-12-04T09:33:41.4351401Z  * [new branch]              gh/anijain2305/959/orig     -> origin/gh/anijain2305/959/orig
2025-12-04T09:33:41.4353429Z  * [new branch]              gh/anijain2305/960/base     -> origin/gh/anijain2305/960/base
2025-12-04T09:33:41.4354729Z  * [new branch]              gh/anijain2305/960/head     -> origin/gh/anijain2305/960/head
2025-12-04T09:33:41.4356000Z  * [new branch]              gh/anijain2305/960/orig     -> origin/gh/anijain2305/960/orig
2025-12-04T09:33:41.4357943Z  * [new branch]              gh/anijain2305/961/base     -> origin/gh/anijain2305/961/base
2025-12-04T09:33:41.4359190Z  * [new branch]              gh/anijain2305/961/head     -> origin/gh/anijain2305/961/head
2025-12-04T09:33:41.4360505Z  * [new branch]              gh/anijain2305/961/orig     -> origin/gh/anijain2305/961/orig
2025-12-04T09:33:41.4362413Z  * [new branch]              gh/anijain2305/962/base     -> origin/gh/anijain2305/962/base
2025-12-04T09:33:41.4363651Z  * [new branch]              gh/anijain2305/962/head     -> origin/gh/anijain2305/962/head
2025-12-04T09:33:41.4364956Z  * [new branch]              gh/anijain2305/962/orig     -> origin/gh/anijain2305/962/orig
2025-12-04T09:33:41.4367257Z  * [new branch]              gh/anijain2305/963/base     -> origin/gh/anijain2305/963/base
2025-12-04T09:33:41.4368743Z  * [new branch]              gh/anijain2305/963/head     -> origin/gh/anijain2305/963/head
2025-12-04T09:33:41.4370021Z  * [new branch]              gh/anijain2305/963/orig     -> origin/gh/anijain2305/963/orig
2025-12-04T09:33:41.4371905Z  * [new branch]              gh/anijain2305/964/base     -> origin/gh/anijain2305/964/base
2025-12-04T09:33:41.4373192Z  * [new branch]              gh/anijain2305/964/head     -> origin/gh/anijain2305/964/head
2025-12-04T09:33:41.4374493Z  * [new branch]              gh/anijain2305/964/orig     -> origin/gh/anijain2305/964/orig
2025-12-04T09:33:41.4376322Z  * [new branch]              gh/anijain2305/965/base     -> origin/gh/anijain2305/965/base
2025-12-04T09:33:41.4377530Z  * [new branch]              gh/anijain2305/965/head     -> origin/gh/anijain2305/965/head
2025-12-04T09:33:41.4379048Z  * [new branch]              gh/anijain2305/965/orig     -> origin/gh/anijain2305/965/orig
2025-12-04T09:33:41.4381137Z  * [new branch]              gh/anijain2305/966/base     -> origin/gh/anijain2305/966/base
2025-12-04T09:33:41.4382487Z  * [new branch]              gh/anijain2305/966/head     -> origin/gh/anijain2305/966/head
2025-12-04T09:33:41.4383645Z  * [new branch]              gh/anijain2305/966/orig     -> origin/gh/anijain2305/966/orig
2025-12-04T09:33:41.4385521Z  * [new branch]              gh/anijain2305/967/base     -> origin/gh/anijain2305/967/base
2025-12-04T09:33:41.4386752Z  * [new branch]              gh/anijain2305/967/head     -> origin/gh/anijain2305/967/head
2025-12-04T09:33:41.4388284Z  * [new branch]              gh/anijain2305/967/orig     -> origin/gh/anijain2305/967/orig
2025-12-04T09:33:41.4389991Z  * [new branch]              gh/anijain2305/968/base     -> origin/gh/anijain2305/968/base
2025-12-04T09:33:41.4391282Z  * [new branch]              gh/anijain2305/968/head     -> origin/gh/anijain2305/968/head
2025-12-04T09:33:41.4392581Z  * [new branch]              gh/anijain2305/968/orig     -> origin/gh/anijain2305/968/orig
2025-12-04T09:33:41.4394358Z  * [new branch]              gh/anijain2305/969/base     -> origin/gh/anijain2305/969/base
2025-12-04T09:33:41.4395631Z  * [new branch]              gh/anijain2305/969/head     -> origin/gh/anijain2305/969/head
2025-12-04T09:33:41.4397054Z  * [new branch]              gh/anijain2305/969/orig     -> origin/gh/anijain2305/969/orig
2025-12-04T09:33:41.4398852Z  * [new branch]              gh/anijain2305/970/base     -> origin/gh/anijain2305/970/base
2025-12-04T09:33:41.4400217Z  * [new branch]              gh/anijain2305/970/head     -> origin/gh/anijain2305/970/head
2025-12-04T09:33:41.4401730Z  * [new branch]              gh/anijain2305/970/orig     -> origin/gh/anijain2305/970/orig
2025-12-04T09:33:41.4404109Z  * [new branch]              gh/anjali411/216/base       -> origin/gh/anjali411/216/base
2025-12-04T09:33:41.4405295Z  * [new branch]              gh/anjali411/216/head       -> origin/gh/anjali411/216/head
2025-12-04T09:33:41.4406593Z  * [new branch]              gh/anjali411/216/orig       -> origin/gh/anjali411/216/orig
2025-12-04T09:33:41.4409131Z  * [new branch]              gh/anshul-si/1/base         -> origin/gh/anshul-si/1/base
2025-12-04T09:33:41.4410346Z  * [new branch]              gh/anshul-si/1/head         -> origin/gh/anshul-si/1/head
2025-12-04T09:33:41.4412109Z  * [new branch]              gh/anshul-si/2/base         -> origin/gh/anshul-si/2/base
2025-12-04T09:33:41.4413687Z  * [new branch]              gh/anshul-si/2/head         -> origin/gh/anshul-si/2/head
2025-12-04T09:33:41.4414782Z  * [new branch]              gh/anshul-si/3/base         -> origin/gh/anshul-si/3/base
2025-12-04T09:33:41.4416030Z  * [new branch]              gh/anshul-si/3/head         -> origin/gh/anshul-si/3/head
2025-12-04T09:33:41.4417661Z  * [new branch]              gh/anshul-si/4/base         -> origin/gh/anshul-si/4/base
2025-12-04T09:33:41.4418777Z  * [new branch]              gh/anshul-si/4/head         -> origin/gh/anshul-si/4/head
2025-12-04T09:33:41.4420404Z  * [new branch]              gh/anshul-si/5/base         -> origin/gh/anshul-si/5/base
2025-12-04T09:33:41.4421617Z  * [new branch]              gh/anshul-si/5/head         -> origin/gh/anshul-si/5/head
2025-12-04T09:33:41.4423661Z  * [new branch]              gh/anshul-si/53/base        -> origin/gh/anshul-si/53/base
2025-12-04T09:33:41.4424900Z  * [new branch]              gh/anshul-si/53/head        -> origin/gh/anshul-si/53/head
2025-12-04T09:33:41.4426923Z  * [new branch]              gh/anshul-si/58/base        -> origin/gh/anshul-si/58/base
2025-12-04T09:33:41.4428109Z  * [new branch]              gh/anshul-si/58/head        -> origin/gh/anshul-si/58/head
2025-12-04T09:33:41.4429763Z  * [new branch]              gh/anshul-si/66/base        -> origin/gh/anshul-si/66/base
2025-12-04T09:33:41.4431020Z  * [new branch]              gh/anshul-si/66/head        -> origin/gh/anshul-si/66/head
2025-12-04T09:33:41.4432272Z  * [new branch]              gh/anshul-si/66/orig        -> origin/gh/anshul-si/66/orig
2025-12-04T09:33:41.4433928Z  * [new branch]              gh/anshul-si/67/base        -> origin/gh/anshul-si/67/base
2025-12-04T09:33:41.4435111Z  * [new branch]              gh/anshul-si/67/head        -> origin/gh/anshul-si/67/head
2025-12-04T09:33:41.4436381Z  * [new branch]              gh/anshul-si/67/orig        -> origin/gh/anshul-si/67/orig
2025-12-04T09:33:41.4438432Z  * [new branch]              gh/anshul-si/68/base        -> origin/gh/anshul-si/68/base
2025-12-04T09:33:41.4440107Z  * [new branch]              gh/anshul-si/68/head        -> origin/gh/anshul-si/68/head
2025-12-04T09:33:41.4441252Z  * [new branch]              gh/anshul-si/68/orig        -> origin/gh/anshul-si/68/orig
2025-12-04T09:33:41.4443586Z  * [new branch]              gh/anshul-si/69/base        -> origin/gh/anshul-si/69/base
2025-12-04T09:33:41.4444721Z  * [new branch]              gh/anshul-si/69/head        -> origin/gh/anshul-si/69/head
2025-12-04T09:33:41.4446149Z  * [new branch]              gh/anshul-si/69/orig        -> origin/gh/anshul-si/69/orig
2025-12-04T09:33:41.4447857Z  * [new branch]              gh/anshul-si/70/base        -> origin/gh/anshul-si/70/base
2025-12-04T09:33:41.4449109Z  * [new branch]              gh/anshul-si/70/head        -> origin/gh/anshul-si/70/head
2025-12-04T09:33:41.4450960Z  * [new branch]              gh/anshul-si/70/orig        -> origin/gh/anshul-si/70/orig
2025-12-04T09:33:41.4452607Z  * [new branch]              gh/anshul-si/71/base        -> origin/gh/anshul-si/71/base
2025-12-04T09:33:41.4453880Z  * [new branch]              gh/anshul-si/71/head        -> origin/gh/anshul-si/71/head
2025-12-04T09:33:41.4455151Z  * [new branch]              gh/anshul-si/71/orig        -> origin/gh/anshul-si/71/orig
2025-12-04T09:33:41.4457004Z  * [new branch]              gh/anshul-si/72/base        -> origin/gh/anshul-si/72/base
2025-12-04T09:33:41.4458268Z  * [new branch]              gh/anshul-si/72/head        -> origin/gh/anshul-si/72/head
2025-12-04T09:33:41.4459572Z  * [new branch]              gh/anshul-si/72/orig        -> origin/gh/anshul-si/72/orig
2025-12-04T09:33:41.4461485Z  * [new branch]              gh/anshul-si/73/base        -> origin/gh/anshul-si/73/base
2025-12-04T09:33:41.4462711Z  * [new branch]              gh/anshul-si/73/head        -> origin/gh/anshul-si/73/head
2025-12-04T09:33:41.4464000Z  * [new branch]              gh/anshul-si/73/orig        -> origin/gh/anshul-si/73/orig
2025-12-04T09:33:41.4466354Z  * [new branch]              gh/aorenste/132/base        -> origin/gh/aorenste/132/base
2025-12-04T09:33:41.4467575Z  * [new branch]              gh/aorenste/132/head        -> origin/gh/aorenste/132/head
2025-12-04T09:33:41.4469592Z  * [new branch]              gh/aorenste/134/base        -> origin/gh/aorenste/134/base
2025-12-04T09:33:41.4470989Z  * [new branch]              gh/aorenste/134/head        -> origin/gh/aorenste/134/head
2025-12-04T09:33:41.4472265Z  * [new branch]              gh/aorenste/134/orig        -> origin/gh/aorenste/134/orig
2025-12-04T09:33:41.4474224Z  * [new branch]              gh/aorenste/139/base        -> origin/gh/aorenste/139/base
2025-12-04T09:33:41.4475494Z  * [new branch]              gh/aorenste/139/head        -> origin/gh/aorenste/139/head
2025-12-04T09:33:41.4476812Z  * [new branch]              gh/aorenste/139/orig        -> origin/gh/aorenste/139/orig
2025-12-04T09:33:41.4478723Z  * [new branch]              gh/aorenste/141/base        -> origin/gh/aorenste/141/base
2025-12-04T09:33:41.4479916Z  * [new branch]              gh/aorenste/141/head        -> origin/gh/aorenste/141/head
2025-12-04T09:33:41.4482136Z  * [new branch]              gh/aorenste/145/base        -> origin/gh/aorenste/145/base
2025-12-04T09:33:41.4483451Z  * [new branch]              gh/aorenste/145/head        -> origin/gh/aorenste/145/head
2025-12-04T09:33:41.4485036Z  * [new branch]              gh/aorenste/145/orig        -> origin/gh/aorenste/145/orig
2025-12-04T09:33:41.4486883Z  * [new branch]              gh/aorenste/146/base        -> origin/gh/aorenste/146/base
2025-12-04T09:33:41.4488192Z  * [new branch]              gh/aorenste/146/head        -> origin/gh/aorenste/146/head
2025-12-04T09:33:41.4489498Z  * [new branch]              gh/aorenste/146/orig        -> origin/gh/aorenste/146/orig
2025-12-04T09:33:41.4491488Z  * [new branch]              gh/aorenste/147/base        -> origin/gh/aorenste/147/base
2025-12-04T09:33:41.4492836Z  * [new branch]              gh/aorenste/147/head        -> origin/gh/aorenste/147/head
2025-12-04T09:33:41.4494281Z  * [new branch]              gh/aorenste/147/orig        -> origin/gh/aorenste/147/orig
2025-12-04T09:33:41.4496137Z  * [new branch]              gh/aorenste/148/base        -> origin/gh/aorenste/148/base
2025-12-04T09:33:41.4497372Z  * [new branch]              gh/aorenste/148/head        -> origin/gh/aorenste/148/head
2025-12-04T09:33:41.4498848Z  * [new branch]              gh/aorenste/148/orig        -> origin/gh/aorenste/148/orig
2025-12-04T09:33:41.4500584Z  * [new branch]              gh/aorenste/149/base        -> origin/gh/aorenste/149/base
2025-12-04T09:33:41.4505043Z  * [new branch]              gh/aorenste/149/head        -> origin/gh/aorenste/149/head
2025-12-04T09:33:41.4506218Z  * [new branch]              gh/aorenste/149/orig        -> origin/gh/aorenste/149/orig
2025-12-04T09:33:41.4508237Z  * [new branch]              gh/aorenste/150/base        -> origin/gh/aorenste/150/base
2025-12-04T09:33:41.4509340Z  * [new branch]              gh/aorenste/150/head        -> origin/gh/aorenste/150/head
2025-12-04T09:33:41.4510803Z  * [new branch]              gh/aorenste/150/orig        -> origin/gh/aorenste/150/orig
2025-12-04T09:33:41.4512373Z  * [new branch]              gh/aorenste/151/base        -> origin/gh/aorenste/151/base
2025-12-04T09:33:41.4513593Z  * [new branch]              gh/aorenste/151/head        -> origin/gh/aorenste/151/head
2025-12-04T09:33:41.4514906Z  * [new branch]              gh/aorenste/151/orig        -> origin/gh/aorenste/151/orig
2025-12-04T09:33:41.4516805Z  * [new branch]              gh/aorenste/152/base        -> origin/gh/aorenste/152/base
2025-12-04T09:33:41.4517944Z  * [new branch]              gh/aorenste/152/head        -> origin/gh/aorenste/152/head
2025-12-04T09:33:41.4519426Z  * [new branch]              gh/aorenste/152/orig        -> origin/gh/aorenste/152/orig
2025-12-04T09:33:41.4521056Z  * [new branch]              gh/aorenste/153/base        -> origin/gh/aorenste/153/base
2025-12-04T09:33:41.4522269Z  * [new branch]              gh/aorenste/153/head        -> origin/gh/aorenste/153/head
2025-12-04T09:33:41.4523666Z  * [new branch]              gh/aorenste/153/orig        -> origin/gh/aorenste/153/orig
2025-12-04T09:33:41.4525352Z  * [new branch]              gh/aorenste/154/base        -> origin/gh/aorenste/154/base
2025-12-04T09:33:41.4527063Z  * [new branch]              gh/aorenste/154/head        -> origin/gh/aorenste/154/head
2025-12-04T09:33:41.4527927Z  * [new branch]              gh/aorenste/154/orig        -> origin/gh/aorenste/154/orig
2025-12-04T09:33:41.4529561Z  * [new branch]              gh/aorenste/155/base        -> origin/gh/aorenste/155/base
2025-12-04T09:33:41.4530786Z  * [new branch]              gh/aorenste/155/head        -> origin/gh/aorenste/155/head
2025-12-04T09:33:41.4531995Z  * [new branch]              gh/aorenste/155/orig        -> origin/gh/aorenste/155/orig
2025-12-04T09:33:41.4533811Z  * [new branch]              gh/aorenste/156/base        -> origin/gh/aorenste/156/base
2025-12-04T09:33:41.4534823Z  * [new branch]              gh/aorenste/156/head        -> origin/gh/aorenste/156/head
2025-12-04T09:33:41.4536081Z  * [new branch]              gh/aorenste/156/orig        -> origin/gh/aorenste/156/orig
2025-12-04T09:33:41.4538228Z  * [new branch]              gh/aorenste/157/base        -> origin/gh/aorenste/157/base
2025-12-04T09:33:41.4539470Z  * [new branch]              gh/aorenste/157/head        -> origin/gh/aorenste/157/head
2025-12-04T09:33:41.4540753Z  * [new branch]              gh/aorenste/157/orig        -> origin/gh/aorenste/157/orig
2025-12-04T09:33:41.4542426Z  * [new branch]              gh/aorenste/158/base        -> origin/gh/aorenste/158/base
2025-12-04T09:33:41.4543655Z  * [new branch]              gh/aorenste/158/head        -> origin/gh/aorenste/158/head
2025-12-04T09:33:41.4544820Z  * [new branch]              gh/aorenste/158/orig        -> origin/gh/aorenste/158/orig
2025-12-04T09:33:41.4546533Z  * [new branch]              gh/aorenste/159/base        -> origin/gh/aorenste/159/base
2025-12-04T09:33:41.4547761Z  * [new branch]              gh/aorenste/159/head        -> origin/gh/aorenste/159/head
2025-12-04T09:33:41.4548918Z  * [new branch]              gh/aorenste/159/orig        -> origin/gh/aorenste/159/orig
2025-12-04T09:33:41.4551102Z  * [new branch]              gh/avikchaudhuri/1/base     -> origin/gh/avikchaudhuri/1/base
2025-12-04T09:33:41.4552453Z  * [new branch]              gh/avikchaudhuri/1/head     -> origin/gh/avikchaudhuri/1/head
2025-12-04T09:33:41.4554090Z  * [new branch]              gh/avikchaudhuri/2/base     -> origin/gh/avikchaudhuri/2/base
2025-12-04T09:33:41.4555297Z  * [new branch]              gh/avikchaudhuri/2/head     -> origin/gh/avikchaudhuri/2/head
2025-12-04T09:33:41.4556494Z  * [new branch]              gh/avikchaudhuri/2/orig     -> origin/gh/avikchaudhuri/2/orig
2025-12-04T09:33:41.4559104Z  * [new branch]              gh/bdhirsh/666/base         -> origin/gh/bdhirsh/666/base
2025-12-04T09:33:41.4560309Z  * [new branch]              gh/bdhirsh/666/head         -> origin/gh/bdhirsh/666/head
2025-12-04T09:33:41.4561565Z  * [new branch]              gh/bdhirsh/666/orig         -> origin/gh/bdhirsh/666/orig
2025-12-04T09:33:41.4563503Z  * [new branch]              gh/bdhirsh/668/base         -> origin/gh/bdhirsh/668/base
2025-12-04T09:33:41.4564718Z  * [new branch]              gh/bdhirsh/668/head         -> origin/gh/bdhirsh/668/head
2025-12-04T09:33:41.4566036Z  * [new branch]              gh/bdhirsh/668/orig         -> origin/gh/bdhirsh/668/orig
2025-12-04T09:33:41.4568040Z  * [new branch]              gh/bdhirsh/669/base         -> origin/gh/bdhirsh/669/base
2025-12-04T09:33:41.4569247Z  * [new branch]              gh/bdhirsh/669/head         -> origin/gh/bdhirsh/669/head
2025-12-04T09:33:41.4570711Z  * [new branch]              gh/bdhirsh/669/orig         -> origin/gh/bdhirsh/669/orig
2025-12-04T09:33:41.4572645Z  * [new branch]              gh/bdhirsh/670/base         -> origin/gh/bdhirsh/670/base
2025-12-04T09:33:41.4573935Z  * [new branch]              gh/bdhirsh/670/head         -> origin/gh/bdhirsh/670/head
2025-12-04T09:33:41.4575224Z  * [new branch]              gh/bdhirsh/670/orig         -> origin/gh/bdhirsh/670/orig
2025-12-04T09:33:41.4577143Z  * [new branch]              gh/bdhirsh/672/base         -> origin/gh/bdhirsh/672/base
2025-12-04T09:33:41.4578269Z  * [new branch]              gh/bdhirsh/672/head         -> origin/gh/bdhirsh/672/head
2025-12-04T09:33:41.4579532Z  * [new branch]              gh/bdhirsh/672/orig         -> origin/gh/bdhirsh/672/orig
2025-12-04T09:33:41.4581624Z  * [new branch]              gh/bdhirsh/675/base         -> origin/gh/bdhirsh/675/base
2025-12-04T09:33:41.4583086Z  * [new branch]              gh/bdhirsh/675/head         -> origin/gh/bdhirsh/675/head
2025-12-04T09:33:41.4584271Z  * [new branch]              gh/bdhirsh/675/orig         -> origin/gh/bdhirsh/675/orig
2025-12-04T09:33:41.4586132Z  * [new branch]              gh/bdhirsh/676/base         -> origin/gh/bdhirsh/676/base
2025-12-04T09:33:41.4587633Z  * [new branch]              gh/bdhirsh/676/head         -> origin/gh/bdhirsh/676/head
2025-12-04T09:33:41.4588773Z  * [new branch]              gh/bdhirsh/676/orig         -> origin/gh/bdhirsh/676/orig
2025-12-04T09:33:41.4590619Z  * [new branch]              gh/bdhirsh/677/base         -> origin/gh/bdhirsh/677/base
2025-12-04T09:33:41.4592327Z  * [new branch]              gh/bdhirsh/677/head         -> origin/gh/bdhirsh/677/head
2025-12-04T09:33:41.4593668Z  * [new branch]              gh/bdhirsh/677/orig         -> origin/gh/bdhirsh/677/orig
2025-12-04T09:33:41.4595550Z  * [new branch]              gh/bdhirsh/678/base         -> origin/gh/bdhirsh/678/base
2025-12-04T09:33:41.4596952Z  * [new branch]              gh/bdhirsh/678/head         -> origin/gh/bdhirsh/678/head
2025-12-04T09:33:41.4598283Z  * [new branch]              gh/bdhirsh/678/orig         -> origin/gh/bdhirsh/678/orig
2025-12-04T09:33:41.4600166Z  * [new branch]              gh/bdhirsh/679/base         -> origin/gh/bdhirsh/679/base
2025-12-04T09:33:41.4601763Z  * [new branch]              gh/bdhirsh/679/head         -> origin/gh/bdhirsh/679/head
2025-12-04T09:33:41.4603237Z  * [new branch]              gh/bdhirsh/679/orig         -> origin/gh/bdhirsh/679/orig
2025-12-04T09:33:41.4605004Z  * [new branch]              gh/bdhirsh/680/base         -> origin/gh/bdhirsh/680/base
2025-12-04T09:33:41.4606506Z  * [new branch]              gh/bdhirsh/680/head         -> origin/gh/bdhirsh/680/head
2025-12-04T09:33:41.4607777Z  * [new branch]              gh/bdhirsh/680/orig         -> origin/gh/bdhirsh/680/orig
2025-12-04T09:33:41.4609339Z  * [new branch]              gh/bdhirsh/681/base         -> origin/gh/bdhirsh/681/base
2025-12-04T09:33:41.4610723Z  * [new branch]              gh/bdhirsh/681/head         -> origin/gh/bdhirsh/681/head
2025-12-04T09:33:41.4612133Z  * [new branch]              gh/bdhirsh/681/orig         -> origin/gh/bdhirsh/681/orig
2025-12-04T09:33:41.4614225Z  * [new branch]              gh/benjaminglass1/101/base  -> origin/gh/benjaminglass1/101/base
2025-12-04T09:33:41.4615535Z  * [new branch]              gh/benjaminglass1/101/head  -> origin/gh/benjaminglass1/101/head
2025-12-04T09:33:41.4616833Z  * [new branch]              gh/benjaminglass1/101/orig  -> origin/gh/benjaminglass1/101/orig
2025-12-04T09:33:41.4618612Z  * [new branch]              gh/benjaminglass1/102/base  -> origin/gh/benjaminglass1/102/base
2025-12-04T09:33:41.4619918Z  * [new branch]              gh/benjaminglass1/102/head  -> origin/gh/benjaminglass1/102/head
2025-12-04T09:33:41.4621188Z  * [new branch]              gh/benjaminglass1/102/orig  -> origin/gh/benjaminglass1/102/orig
2025-12-04T09:33:41.4623077Z  * [new branch]              gh/benjaminglass1/106/base  -> origin/gh/benjaminglass1/106/base
2025-12-04T09:33:41.4624365Z  * [new branch]              gh/benjaminglass1/106/head  -> origin/gh/benjaminglass1/106/head
2025-12-04T09:33:41.4625679Z  * [new branch]              gh/benjaminglass1/106/orig  -> origin/gh/benjaminglass1/106/orig
2025-12-04T09:33:41.4627369Z  * [new branch]              gh/benjaminglass1/107/base  -> origin/gh/benjaminglass1/107/base
2025-12-04T09:33:41.4628663Z  * [new branch]              gh/benjaminglass1/107/head  -> origin/gh/benjaminglass1/107/head
2025-12-04T09:33:41.4629969Z  * [new branch]              gh/benjaminglass1/107/orig  -> origin/gh/benjaminglass1/107/orig
2025-12-04T09:33:41.4631682Z  * [new branch]              gh/benjaminglass1/108/base  -> origin/gh/benjaminglass1/108/base
2025-12-04T09:33:41.4632958Z  * [new branch]              gh/benjaminglass1/108/head  -> origin/gh/benjaminglass1/108/head
2025-12-04T09:33:41.4634239Z  * [new branch]              gh/benjaminglass1/108/orig  -> origin/gh/benjaminglass1/108/orig
2025-12-04T09:33:41.4635940Z  * [new branch]              gh/benjaminglass1/109/base  -> origin/gh/benjaminglass1/109/base
2025-12-04T09:33:41.4637202Z  * [new branch]              gh/benjaminglass1/109/head  -> origin/gh/benjaminglass1/109/head
2025-12-04T09:33:41.4638551Z  * [new branch]              gh/benjaminglass1/109/orig  -> origin/gh/benjaminglass1/109/orig
2025-12-04T09:33:41.4640311Z  * [new branch]              gh/benjaminglass1/97/base   -> origin/gh/benjaminglass1/97/base
2025-12-04T09:33:41.4641574Z  * [new branch]              gh/benjaminglass1/97/head   -> origin/gh/benjaminglass1/97/head
2025-12-04T09:33:41.4642981Z  * [new branch]              gh/benjaminglass1/97/orig   -> origin/gh/benjaminglass1/97/orig
2025-12-04T09:33:41.4644979Z  * [new branch]              gh/bobrenjc93/570/base      -> origin/gh/bobrenjc93/570/base
2025-12-04T09:33:41.4646317Z  * [new branch]              gh/bobrenjc93/570/head      -> origin/gh/bobrenjc93/570/head
2025-12-04T09:33:41.4647599Z  * [new branch]              gh/bobrenjc93/570/orig      -> origin/gh/bobrenjc93/570/orig
2025-12-04T09:33:41.4649187Z  * [new branch]              gh/bobrenjc93/604/base      -> origin/gh/bobrenjc93/604/base
2025-12-04T09:33:41.4650555Z  * [new branch]              gh/bobrenjc93/604/head      -> origin/gh/bobrenjc93/604/head
2025-12-04T09:33:41.4651798Z  * [new branch]              gh/bobrenjc93/604/orig      -> origin/gh/bobrenjc93/604/orig
2025-12-04T09:33:41.4653495Z  * [new branch]              gh/bobrenjc93/638/base      -> origin/gh/bobrenjc93/638/base
2025-12-04T09:33:41.4654790Z  * [new branch]              gh/bobrenjc93/638/head      -> origin/gh/bobrenjc93/638/head
2025-12-04T09:33:41.4656067Z  * [new branch]              gh/bobrenjc93/638/orig      -> origin/gh/bobrenjc93/638/orig
2025-12-04T09:33:41.4657902Z  * [new branch]              gh/bobrenjc93/653/base      -> origin/gh/bobrenjc93/653/base
2025-12-04T09:33:41.4659189Z  * [new branch]              gh/bobrenjc93/653/head      -> origin/gh/bobrenjc93/653/head
2025-12-04T09:33:41.4660459Z  * [new branch]              gh/bobrenjc93/653/orig      -> origin/gh/bobrenjc93/653/orig
2025-12-04T09:33:41.4662304Z  * [new branch]              gh/bobrenjc93/654/base      -> origin/gh/bobrenjc93/654/base
2025-12-04T09:33:41.4663627Z  * [new branch]              gh/bobrenjc93/654/head      -> origin/gh/bobrenjc93/654/head
2025-12-04T09:33:41.4664950Z  * [new branch]              gh/bobrenjc93/654/orig      -> origin/gh/bobrenjc93/654/orig
2025-12-04T09:33:41.4666628Z  * [new branch]              gh/bobrenjc93/657/base      -> origin/gh/bobrenjc93/657/base
2025-12-04T09:33:41.4667858Z  * [new branch]              gh/bobrenjc93/657/head      -> origin/gh/bobrenjc93/657/head
2025-12-04T09:33:41.4669104Z  * [new branch]              gh/bobrenjc93/657/orig      -> origin/gh/bobrenjc93/657/orig
2025-12-04T09:33:41.4670885Z  * [new branch]              gh/bobrenjc93/672/base      -> origin/gh/bobrenjc93/672/base
2025-12-04T09:33:41.4672058Z  * [new branch]              gh/bobrenjc93/672/head      -> origin/gh/bobrenjc93/672/head
2025-12-04T09:33:41.4673410Z  * [new branch]              gh/bobrenjc93/672/orig      -> origin/gh/bobrenjc93/672/orig
2025-12-04T09:33:41.4675233Z  * [new branch]              gh/bobrenjc93/679/base      -> origin/gh/bobrenjc93/679/base
2025-12-04T09:33:41.4676795Z  * [new branch]              gh/bobrenjc93/679/head      -> origin/gh/bobrenjc93/679/head
2025-12-04T09:33:41.4678041Z  * [new branch]              gh/bobrenjc93/679/orig      -> origin/gh/bobrenjc93/679/orig
2025-12-04T09:33:41.4679817Z  * [new branch]              gh/bobrenjc93/680/base      -> origin/gh/bobrenjc93/680/base
2025-12-04T09:33:41.4681102Z  * [new branch]              gh/bobrenjc93/680/head      -> origin/gh/bobrenjc93/680/head
2025-12-04T09:33:41.4683014Z  * [new branch]              gh/bobrenjc93/680/orig      -> origin/gh/bobrenjc93/680/orig
2025-12-04T09:33:41.4684581Z  * [new branch]              gh/bobrenjc93/681/base      -> origin/gh/bobrenjc93/681/base
2025-12-04T09:33:41.4685868Z  * [new branch]              gh/bobrenjc93/681/head      -> origin/gh/bobrenjc93/681/head
2025-12-04T09:33:41.4687203Z  * [new branch]              gh/bobrenjc93/681/orig      -> origin/gh/bobrenjc93/681/orig
2025-12-04T09:33:41.4688751Z  * [new branch]              gh/bobrenjc93/682/base      -> origin/gh/bobrenjc93/682/base
2025-12-04T09:33:41.4690037Z  * [new branch]              gh/bobrenjc93/682/head      -> origin/gh/bobrenjc93/682/head
2025-12-04T09:33:41.4691308Z  * [new branch]              gh/bobrenjc93/682/orig      -> origin/gh/bobrenjc93/682/orig
2025-12-04T09:33:41.4693097Z  * [new branch]              gh/bobrenjc93/683/base      -> origin/gh/bobrenjc93/683/base
2025-12-04T09:33:41.4694441Z  * [new branch]              gh/bobrenjc93/683/head      -> origin/gh/bobrenjc93/683/head
2025-12-04T09:33:41.4695671Z  * [new branch]              gh/bobrenjc93/683/orig      -> origin/gh/bobrenjc93/683/orig
2025-12-04T09:33:41.4697417Z  * [new branch]              gh/bobrenjc93/684/base      -> origin/gh/bobrenjc93/684/base
2025-12-04T09:33:41.4698921Z  * [new branch]              gh/bobrenjc93/684/head      -> origin/gh/bobrenjc93/684/head
2025-12-04T09:33:41.4700430Z  * [new branch]              gh/bobrenjc93/684/orig      -> origin/gh/bobrenjc93/684/orig
2025-12-04T09:33:41.4702333Z  * [new branch]              gh/bobrenjc93/685/base      -> origin/gh/bobrenjc93/685/base
2025-12-04T09:33:41.4703928Z  * [new branch]              gh/bobrenjc93/685/head      -> origin/gh/bobrenjc93/685/head
2025-12-04T09:33:41.4705652Z  * [new branch]              gh/bobrenjc93/685/orig      -> origin/gh/bobrenjc93/685/orig
2025-12-04T09:33:41.4707573Z  * [new branch]              gh/bobrenjc93/686/base      -> origin/gh/bobrenjc93/686/base
2025-12-04T09:33:41.4711527Z  * [new branch]              gh/bobrenjc93/686/head      -> origin/gh/bobrenjc93/686/head
2025-12-04T09:33:41.4711797Z  * [new branch]              gh/bobrenjc93/686/orig      -> origin/gh/bobrenjc93/686/orig
2025-12-04T09:33:41.4712452Z  * [new branch]              gh/bobrenjc93/687/base      -> origin/gh/bobrenjc93/687/base
2025-12-04T09:33:41.4714525Z  * [new branch]              gh/bobrenjc93/687/head      -> origin/gh/bobrenjc93/687/head
2025-12-04T09:33:41.4715184Z  * [new branch]              gh/bobrenjc93/687/orig      -> origin/gh/bobrenjc93/687/orig
2025-12-04T09:33:41.4717546Z  * [new branch]              gh/bobrenjc93/688/base      -> origin/gh/bobrenjc93/688/base
2025-12-04T09:33:41.4718873Z  * [new branch]              gh/bobrenjc93/688/head      -> origin/gh/bobrenjc93/688/head
2025-12-04T09:33:41.4720166Z  * [new branch]              gh/bobrenjc93/688/orig      -> origin/gh/bobrenjc93/688/orig
2025-12-04T09:33:41.4721813Z  * [new branch]              gh/bobrenjc93/689/base      -> origin/gh/bobrenjc93/689/base
2025-12-04T09:33:41.4723351Z  * [new branch]              gh/bobrenjc93/689/head      -> origin/gh/bobrenjc93/689/head
2025-12-04T09:33:41.4724667Z  * [new branch]              gh/bobrenjc93/689/orig      -> origin/gh/bobrenjc93/689/orig
2025-12-04T09:33:41.4726310Z  * [new branch]              gh/bobrenjc93/690/base      -> origin/gh/bobrenjc93/690/base
2025-12-04T09:33:41.4727580Z  * [new branch]              gh/bobrenjc93/690/head      -> origin/gh/bobrenjc93/690/head
2025-12-04T09:33:41.4728902Z  * [new branch]              gh/bobrenjc93/690/orig      -> origin/gh/bobrenjc93/690/orig
2025-12-04T09:33:41.4731512Z  * [new branch]              gh/bobrenjc93/691/base      -> origin/gh/bobrenjc93/691/base
2025-12-04T09:33:41.4733156Z  * [new branch]              gh/bobrenjc93/691/head      -> origin/gh/bobrenjc93/691/head
2025-12-04T09:33:41.4734925Z  * [new branch]              gh/bobrenjc93/691/orig      -> origin/gh/bobrenjc93/691/orig
2025-12-04T09:33:41.4737458Z  * [new branch]              gh/bobrenjc93/692/base      -> origin/gh/bobrenjc93/692/base
2025-12-04T09:33:41.4738773Z  * [new branch]              gh/bobrenjc93/692/head      -> origin/gh/bobrenjc93/692/head
2025-12-04T09:33:41.4740070Z  * [new branch]              gh/bobrenjc93/692/orig      -> origin/gh/bobrenjc93/692/orig
2025-12-04T09:33:41.4741698Z  * [new branch]              gh/bobrenjc93/693/base      -> origin/gh/bobrenjc93/693/base
2025-12-04T09:33:41.4742928Z  * [new branch]              gh/bobrenjc93/693/head      -> origin/gh/bobrenjc93/693/head
2025-12-04T09:33:41.4744305Z  * [new branch]              gh/bobrenjc93/693/orig      -> origin/gh/bobrenjc93/693/orig
2025-12-04T09:33:41.4746131Z  * [new branch]              gh/bobrenjc93/694/base      -> origin/gh/bobrenjc93/694/base
2025-12-04T09:33:41.4747475Z  * [new branch]              gh/bobrenjc93/694/head      -> origin/gh/bobrenjc93/694/head
2025-12-04T09:33:41.4748830Z  * [new branch]              gh/bobrenjc93/694/orig      -> origin/gh/bobrenjc93/694/orig
2025-12-04T09:33:41.4750543Z  * [new branch]              gh/bobrenjc93/695/base      -> origin/gh/bobrenjc93/695/base
2025-12-04T09:33:41.4751850Z  * [new branch]              gh/bobrenjc93/695/head      -> origin/gh/bobrenjc93/695/head
2025-12-04T09:33:41.4753128Z  * [new branch]              gh/bobrenjc93/695/orig      -> origin/gh/bobrenjc93/695/orig
2025-12-04T09:33:41.4755252Z  * [new branch]              gh/c00w/23/base             -> origin/gh/c00w/23/base
2025-12-04T09:33:41.4756582Z  * [new branch]              gh/c00w/23/head             -> origin/gh/c00w/23/head
2025-12-04T09:33:41.4758405Z  * [new branch]              gh/c00w/53/base             -> origin/gh/c00w/53/base
2025-12-04T09:33:41.4759664Z  * [new branch]              gh/c00w/53/head             -> origin/gh/c00w/53/head
2025-12-04T09:33:41.4760930Z  * [new branch]              gh/c00w/53/orig             -> origin/gh/c00w/53/orig
2025-12-04T09:33:41.4762531Z  * [new branch]              gh/c00w/54/base             -> origin/gh/c00w/54/base
2025-12-04T09:33:41.4763919Z  * [new branch]              gh/c00w/54/head             -> origin/gh/c00w/54/head
2025-12-04T09:33:41.4765290Z  * [new branch]              gh/c00w/54/orig             -> origin/gh/c00w/54/orig
2025-12-04T09:33:41.4767090Z  * [new branch]              gh/c00w/56/base             -> origin/gh/c00w/56/base
2025-12-04T09:33:41.4768433Z  * [new branch]              gh/c00w/56/head             -> origin/gh/c00w/56/head
2025-12-04T09:33:41.4769633Z  * [new branch]              gh/c00w/56/orig             -> origin/gh/c00w/56/orig
2025-12-04T09:33:41.4771262Z  * [new branch]              gh/c00w/57/base             -> origin/gh/c00w/57/base
2025-12-04T09:33:41.4772536Z  * [new branch]              gh/c00w/57/head             -> origin/gh/c00w/57/head
2025-12-04T09:33:41.4773841Z  * [new branch]              gh/c00w/57/orig             -> origin/gh/c00w/57/orig
2025-12-04T09:33:41.4775467Z  * [new branch]              gh/c00w/58/base             -> origin/gh/c00w/58/base
2025-12-04T09:33:41.4776728Z  * [new branch]              gh/c00w/58/head             -> origin/gh/c00w/58/head
2025-12-04T09:33:41.4777988Z  * [new branch]              gh/c00w/58/orig             -> origin/gh/c00w/58/orig
2025-12-04T09:33:41.4780057Z  * [new branch]              gh/clee2000/1/base          -> origin/gh/clee2000/1/base
2025-12-04T09:33:41.4781415Z  * [new branch]              gh/clee2000/1/head          -> origin/gh/clee2000/1/head
2025-12-04T09:33:41.4782801Z  * [new branch]              gh/clee2000/1/orig          -> origin/gh/clee2000/1/orig
2025-12-04T09:33:41.4785021Z  * [new branch]              gh/coconutruben/1/base      -> origin/gh/coconutruben/1/base
2025-12-04T09:33:41.4786462Z  * [new branch]              gh/coconutruben/1/head      -> origin/gh/coconutruben/1/head
2025-12-04T09:33:41.4788518Z  * [new branch]              gh/coconutruben/55/base     -> origin/gh/coconutruben/55/base
2025-12-04T09:33:41.4789741Z  * [new branch]              gh/coconutruben/55/head     -> origin/gh/coconutruben/55/head
2025-12-04T09:33:41.4791148Z  * [new branch]              gh/coconutruben/55/orig     -> origin/gh/coconutruben/55/orig
2025-12-04T09:33:41.4792996Z  * [new branch]              gh/coconutruben/57/base     -> origin/gh/coconutruben/57/base
2025-12-04T09:33:41.4794641Z  * [new branch]              gh/coconutruben/57/head     -> origin/gh/coconutruben/57/head
2025-12-04T09:33:41.4796116Z  * [new branch]              gh/coconutruben/57/orig     -> origin/gh/coconutruben/57/orig
2025-12-04T09:33:41.4797927Z  * [new branch]              gh/coconutruben/70/base     -> origin/gh/coconutruben/70/base
2025-12-04T09:33:41.4799293Z  * [new branch]              gh/coconutruben/70/head     -> origin/gh/coconutruben/70/head
2025-12-04T09:33:41.4800740Z  * [new branch]              gh/coconutruben/70/orig     -> origin/gh/coconutruben/70/orig
2025-12-04T09:33:41.4804817Z  * [new branch]              gh/coconutruben/71/base     -> origin/gh/coconutruben/71/base
2025-12-04T09:33:41.4806170Z  * [new branch]              gh/coconutruben/71/head     -> origin/gh/coconutruben/71/head
2025-12-04T09:33:41.4807524Z  * [new branch]              gh/coconutruben/71/orig     -> origin/gh/coconutruben/71/orig
2025-12-04T09:33:41.4809143Z  * [new branch]              gh/coconutruben/72/base     -> origin/gh/coconutruben/72/base
2025-12-04T09:33:41.4810490Z  * [new branch]              gh/coconutruben/72/head     -> origin/gh/coconutruben/72/head
2025-12-04T09:33:41.4812148Z  * [new branch]              gh/coconutruben/72/orig     -> origin/gh/coconutruben/72/orig
2025-12-04T09:33:41.4813499Z  * [new branch]              gh/coconutruben/73/base     -> origin/gh/coconutruben/73/base
2025-12-04T09:33:41.4814850Z  * [new branch]              gh/coconutruben/73/head     -> origin/gh/coconutruben/73/head
2025-12-04T09:33:41.4816096Z  * [new branch]              gh/coconutruben/73/orig     -> origin/gh/coconutruben/73/orig
2025-12-04T09:33:41.4818018Z  * [new branch]              gh/coconutruben/74/base     -> origin/gh/coconutruben/74/base
2025-12-04T09:33:41.4819427Z  * [new branch]              gh/coconutruben/74/head     -> origin/gh/coconutruben/74/head
2025-12-04T09:33:41.4820864Z  * [new branch]              gh/coconutruben/74/orig     -> origin/gh/coconutruben/74/orig
2025-12-04T09:33:41.4822779Z  * [new branch]              gh/coconutruben/79/base     -> origin/gh/coconutruben/79/base
2025-12-04T09:33:41.4824272Z  * [new branch]              gh/coconutruben/79/head     -> origin/gh/coconutruben/79/head
2025-12-04T09:33:41.4825517Z  * [new branch]              gh/coconutruben/79/orig     -> origin/gh/coconutruben/79/orig
2025-12-04T09:33:41.4827211Z  * [new branch]              gh/coconutruben/80/base     -> origin/gh/coconutruben/80/base
2025-12-04T09:33:41.4828567Z  * [new branch]              gh/coconutruben/80/head     -> origin/gh/coconutruben/80/head
2025-12-04T09:33:41.4829922Z  * [new branch]              gh/coconutruben/80/orig     -> origin/gh/coconutruben/80/orig
2025-12-04T09:33:41.4831697Z  * [new branch]              gh/coconutruben/82/base     -> origin/gh/coconutruben/82/base
2025-12-04T09:33:41.4832926Z  * [new branch]              gh/coconutruben/82/head     -> origin/gh/coconutruben/82/head
2025-12-04T09:33:41.4834151Z  * [new branch]              gh/coconutruben/82/orig     -> origin/gh/coconutruben/82/orig
2025-12-04T09:33:41.4836108Z  * [new branch]              gh/coconutruben/83/base     -> origin/gh/coconutruben/83/base
2025-12-04T09:33:41.4837321Z  * [new branch]              gh/coconutruben/83/head     -> origin/gh/coconutruben/83/head
2025-12-04T09:33:41.4838676Z  * [new branch]              gh/coconutruben/83/orig     -> origin/gh/coconutruben/83/orig
2025-12-04T09:33:41.4841052Z  * [new branch]              gh/coconutruben/84/base     -> origin/gh/coconutruben/84/base
2025-12-04T09:33:41.4842490Z  * [new branch]              gh/coconutruben/84/head     -> origin/gh/coconutruben/84/head
2025-12-04T09:33:41.4844268Z  * [new branch]              gh/coconutruben/84/orig     -> origin/gh/coconutruben/84/orig
2025-12-04T09:33:41.4845648Z  * [new branch]              gh/coconutruben/85/base     -> origin/gh/coconutruben/85/base
2025-12-04T09:33:41.4846980Z  * [new branch]              gh/coconutruben/85/head     -> origin/gh/coconutruben/85/head
2025-12-04T09:33:41.4848311Z  * [new branch]              gh/coconutruben/85/orig     -> origin/gh/coconutruben/85/orig
2025-12-04T09:33:41.4850084Z  * [new branch]              gh/coconutruben/86/base     -> origin/gh/coconutruben/86/base
2025-12-04T09:33:41.4851393Z  * [new branch]              gh/coconutruben/86/head     -> origin/gh/coconutruben/86/head
2025-12-04T09:33:41.4852693Z  * [new branch]              gh/coconutruben/86/orig     -> origin/gh/coconutruben/86/orig
2025-12-04T09:33:41.4854803Z  * [new branch]              gh/colinchan15/1/base       -> origin/gh/colinchan15/1/base
2025-12-04T09:33:41.4856277Z  * [new branch]              gh/colinchan15/1/head       -> origin/gh/colinchan15/1/head
2025-12-04T09:33:41.4857887Z  * [new branch]              gh/colinchan15/2/base       -> origin/gh/colinchan15/2/base
2025-12-04T09:33:41.4859078Z  * [new branch]              gh/colinchan15/2/head       -> origin/gh/colinchan15/2/head
2025-12-04T09:33:41.4860594Z  * [new branch]              gh/colinchan15/3/base       -> origin/gh/colinchan15/3/base
2025-12-04T09:33:41.4861872Z  * [new branch]              gh/colinchan15/3/head       -> origin/gh/colinchan15/3/head
2025-12-04T09:33:41.4863373Z  * [new branch]              gh/colinchan15/6/base       -> origin/gh/colinchan15/6/base
2025-12-04T09:33:41.4865155Z  * [new branch]              gh/colinchan15/6/head       -> origin/gh/colinchan15/6/head
2025-12-04T09:33:41.4867179Z  * [new branch]              gh/d4l3k/1/base             -> origin/gh/d4l3k/1/base
2025-12-04T09:33:41.4868452Z  * [new branch]              gh/d4l3k/1/head             -> origin/gh/d4l3k/1/head
2025-12-04T09:33:41.4870139Z  * [new branch]              gh/d4l3k/2/base             -> origin/gh/d4l3k/2/base
2025-12-04T09:33:41.4871502Z  * [new branch]              gh/d4l3k/2/head             -> origin/gh/d4l3k/2/head
2025-12-04T09:33:41.4872777Z  * [new branch]              gh/d4l3k/2/orig             -> origin/gh/d4l3k/2/orig
2025-12-04T09:33:41.4874423Z  * [new branch]              gh/d4l3k/3/base             -> origin/gh/d4l3k/3/base
2025-12-04T09:33:41.4875707Z  * [new branch]              gh/d4l3k/3/head             -> origin/gh/d4l3k/3/head
2025-12-04T09:33:41.4877030Z  * [new branch]              gh/d4l3k/3/orig             -> origin/gh/d4l3k/3/orig
2025-12-04T09:33:41.4878717Z  * [new branch]              gh/d4l3k/4/base             -> origin/gh/d4l3k/4/base
2025-12-04T09:33:41.4879995Z  * [new branch]              gh/d4l3k/4/head             -> origin/gh/d4l3k/4/head
2025-12-04T09:33:41.4881240Z  * [new branch]              gh/d4l3k/4/orig             -> origin/gh/d4l3k/4/orig
2025-12-04T09:33:41.4883064Z  * [new branch]              gh/d4l3k/5/base             -> origin/gh/d4l3k/5/base
2025-12-04T09:33:41.4884331Z  * [new branch]              gh/d4l3k/5/orig             -> origin/gh/d4l3k/5/orig
2025-12-04T09:33:41.4886569Z  * [new branch]              gh/davidberard98/392/base   -> origin/gh/davidberard98/392/base
2025-12-04T09:33:41.4887840Z  * [new branch]              gh/davidberard98/392/head   -> origin/gh/davidberard98/392/head
2025-12-04T09:33:41.4889142Z  * [new branch]              gh/davidberard98/392/orig   -> origin/gh/davidberard98/392/orig
2025-12-04T09:33:41.4891004Z  * [new branch]              gh/davidberard98/399/base   -> origin/gh/davidberard98/399/base
2025-12-04T09:33:41.4892331Z  * [new branch]              gh/davidberard98/399/head   -> origin/gh/davidberard98/399/head
2025-12-04T09:33:41.4893654Z  * [new branch]              gh/davidberard98/399/orig   -> origin/gh/davidberard98/399/orig
2025-12-04T09:33:41.4895693Z  * [new branch]              gh/desertfire/605/base      -> origin/gh/desertfire/605/base
2025-12-04T09:33:41.4896973Z  * [new branch]              gh/desertfire/605/head      -> origin/gh/desertfire/605/head
2025-12-04T09:33:41.4898290Z  * [new branch]              gh/desertfire/605/orig      -> origin/gh/desertfire/605/orig
2025-12-04T09:33:41.4899965Z  * [new branch]              gh/desertfire/606/base      -> origin/gh/desertfire/606/base
2025-12-04T09:33:41.4901341Z  * [new branch]              gh/desertfire/606/head      -> origin/gh/desertfire/606/head
2025-12-04T09:33:41.4903095Z  * [new branch]              gh/desertfire/606/orig      -> origin/gh/desertfire/606/orig
2025-12-04T09:33:41.4904909Z  * [new branch]              gh/desertfire/607/base      -> origin/gh/desertfire/607/base
2025-12-04T09:33:41.4906156Z  * [new branch]              gh/desertfire/607/head      -> origin/gh/desertfire/607/head
2025-12-04T09:33:41.4907505Z  * [new branch]              gh/desertfire/607/orig      -> origin/gh/desertfire/607/orig
2025-12-04T09:33:41.4909215Z  * [new branch]              gh/desertfire/608/base      -> origin/gh/desertfire/608/base
2025-12-04T09:33:41.4910445Z  * [new branch]              gh/desertfire/608/head      -> origin/gh/desertfire/608/head
2025-12-04T09:33:41.4911818Z  * [new branch]              gh/desertfire/608/orig      -> origin/gh/desertfire/608/orig
2025-12-04T09:33:41.4913442Z  * [new branch]              gh/desertfire/609/base      -> origin/gh/desertfire/609/base
2025-12-04T09:33:41.4914708Z  * [new branch]              gh/desertfire/609/head      -> origin/gh/desertfire/609/head
2025-12-04T09:33:41.4915994Z  * [new branch]              gh/desertfire/609/orig      -> origin/gh/desertfire/609/orig
2025-12-04T09:33:41.4917943Z  * [new branch]              gh/desertfire/610/base      -> origin/gh/desertfire/610/base
2025-12-04T09:33:41.4919519Z  * [new branch]              gh/desertfire/610/head      -> origin/gh/desertfire/610/head
2025-12-04T09:33:41.4920908Z  * [new branch]              gh/desertfire/610/orig      -> origin/gh/desertfire/610/orig
2025-12-04T09:33:41.4922759Z  * [new branch]              gh/desertfire/611/base      -> origin/gh/desertfire/611/base
2025-12-04T09:33:41.4924183Z  * [new branch]              gh/desertfire/611/head      -> origin/gh/desertfire/611/head
2025-12-04T09:33:41.4925556Z  * [new branch]              gh/desertfire/611/orig      -> origin/gh/desertfire/611/orig
2025-12-04T09:33:41.4927290Z  * [new branch]              gh/desertfire/612/base      -> origin/gh/desertfire/612/base
2025-12-04T09:33:41.4928696Z  * [new branch]              gh/desertfire/612/head      -> origin/gh/desertfire/612/head
2025-12-04T09:33:41.4929883Z  * [new branch]              gh/desertfire/612/orig      -> origin/gh/desertfire/612/orig
2025-12-04T09:33:41.4932099Z  * [new branch]              gh/desertfire/613/base      -> origin/gh/desertfire/613/base
2025-12-04T09:33:41.4933496Z  * [new branch]              gh/desertfire/613/head      -> origin/gh/desertfire/613/head
2025-12-04T09:33:41.4934836Z  * [new branch]              gh/desertfire/613/orig      -> origin/gh/desertfire/613/orig
2025-12-04T09:33:41.4936699Z  * [new branch]              gh/desertfire/614/base      -> origin/gh/desertfire/614/base
2025-12-04T09:33:41.4938150Z  * [new branch]              gh/desertfire/614/head      -> origin/gh/desertfire/614/head
2025-12-04T09:33:41.4939456Z  * [new branch]              gh/desertfire/614/orig      -> origin/gh/desertfire/614/orig
2025-12-04T09:33:41.4941341Z  * [new branch]              gh/desertfire/615/base      -> origin/gh/desertfire/615/base
2025-12-04T09:33:41.4942923Z  * [new branch]              gh/desertfire/615/head      -> origin/gh/desertfire/615/head
2025-12-04T09:33:41.4944196Z  * [new branch]              gh/desertfire/615/orig      -> origin/gh/desertfire/615/orig
2025-12-04T09:33:41.4945761Z  * [new branch]              gh/desertfire/616/base      -> origin/gh/desertfire/616/base
2025-12-04T09:33:41.4947151Z  * [new branch]              gh/desertfire/616/head      -> origin/gh/desertfire/616/head
2025-12-04T09:33:41.4948357Z  * [new branch]              gh/desertfire/616/orig      -> origin/gh/desertfire/616/orig
2025-12-04T09:33:41.4949990Z  * [new branch]              gh/desertfire/617/base      -> origin/gh/desertfire/617/base
2025-12-04T09:33:41.4951338Z  * [new branch]              gh/desertfire/617/head      -> origin/gh/desertfire/617/head
2025-12-04T09:33:41.4952535Z  * [new branch]              gh/desertfire/617/orig      -> origin/gh/desertfire/617/orig
2025-12-04T09:33:41.4954602Z  * [new branch]              gh/dharakk/1/base           -> origin/gh/dharakk/1/base
2025-12-04T09:33:41.4955993Z  * [new branch]              gh/dharakk/1/head           -> origin/gh/dharakk/1/head
2025-12-04T09:33:41.4958201Z  * [new branch]              gh/drisspg/170/base         -> origin/gh/drisspg/170/base
2025-12-04T09:33:41.4959419Z  * [new branch]              gh/drisspg/170/head         -> origin/gh/drisspg/170/head
2025-12-04T09:33:41.4960708Z  * [new branch]              gh/drisspg/170/orig         -> origin/gh/drisspg/170/orig
2025-12-04T09:33:41.4962417Z  * [new branch]              gh/drisspg/182/base         -> origin/gh/drisspg/182/base
2025-12-04T09:33:41.4963862Z  * [new branch]              gh/drisspg/182/head         -> origin/gh/drisspg/182/head
2025-12-04T09:33:41.4965431Z  * [new branch]              gh/drisspg/183/base         -> origin/gh/drisspg/183/base
2025-12-04T09:33:41.4966617Z  * [new branch]              gh/drisspg/183/head         -> origin/gh/drisspg/183/head
2025-12-04T09:33:41.4968154Z  * [new branch]              gh/drisspg/184/base         -> origin/gh/drisspg/184/base
2025-12-04T09:33:41.4969291Z  * [new branch]              gh/drisspg/184/head         -> origin/gh/drisspg/184/head
2025-12-04T09:33:41.4971051Z  * [new branch]              gh/drisspg/185/base         -> origin/gh/drisspg/185/base
2025-12-04T09:33:41.4972340Z  * [new branch]              gh/drisspg/185/head         -> origin/gh/drisspg/185/head
2025-12-04T09:33:41.4974123Z  * [new branch]              gh/drisspg/194/base         -> origin/gh/drisspg/194/base
2025-12-04T09:33:41.4975447Z  * [new branch]              gh/drisspg/194/head         -> origin/gh/drisspg/194/head
2025-12-04T09:33:41.4976701Z  * [new branch]              gh/drisspg/194/orig         -> origin/gh/drisspg/194/orig
2025-12-04T09:33:41.4978407Z  * [new branch]              gh/drisspg/200/base         -> origin/gh/drisspg/200/base
2025-12-04T09:33:41.4979679Z  * [new branch]              gh/drisspg/200/head         -> origin/gh/drisspg/200/head
2025-12-04T09:33:41.4981580Z  * [new branch]              gh/drisspg/200/orig         -> origin/gh/drisspg/200/orig
2025-12-04T09:33:41.5024501Z  * [new branch]              gh/drisspg/218/base         -> origin/gh/drisspg/218/base
2025-12-04T09:33:41.5025052Z  * [new branch]              gh/drisspg/218/head         -> origin/gh/drisspg/218/head
2025-12-04T09:33:41.5025424Z  * [new branch]              gh/drisspg/218/orig         -> origin/gh/drisspg/218/orig
2025-12-04T09:33:41.5025736Z  * [new branch]              gh/drisspg/219/base         -> origin/gh/drisspg/219/base
2025-12-04T09:33:41.5025983Z  * [new branch]              gh/drisspg/219/head         -> origin/gh/drisspg/219/head
2025-12-04T09:33:41.5026243Z  * [new branch]              gh/drisspg/219/orig         -> origin/gh/drisspg/219/orig
2025-12-04T09:33:41.5026486Z  * [new branch]              gh/drisspg/220/base         -> origin/gh/drisspg/220/base
2025-12-04T09:33:41.5026745Z  * [new branch]              gh/drisspg/220/head         -> origin/gh/drisspg/220/head
2025-12-04T09:33:41.5026996Z  * [new branch]              gh/drisspg/220/orig         -> origin/gh/drisspg/220/orig
2025-12-04T09:33:41.5027240Z  * [new branch]              gh/drisspg/221/base         -> origin/gh/drisspg/221/base
2025-12-04T09:33:41.5027498Z  * [new branch]              gh/drisspg/221/head         -> origin/gh/drisspg/221/head
2025-12-04T09:33:41.5027910Z  * [new branch]              gh/drisspg/221/orig         -> origin/gh/drisspg/221/orig
2025-12-04T09:33:41.5028171Z  * [new branch]              gh/drisspg/222/base         -> origin/gh/drisspg/222/base
2025-12-04T09:33:41.5028414Z  * [new branch]              gh/drisspg/222/head         -> origin/gh/drisspg/222/head
2025-12-04T09:33:41.5028655Z  * [new branch]              gh/drisspg/222/orig         -> origin/gh/drisspg/222/orig
2025-12-04T09:33:41.5028914Z  * [new branch]              gh/drisspg/223/base         -> origin/gh/drisspg/223/base
2025-12-04T09:33:41.5029155Z  * [new branch]              gh/drisspg/223/head         -> origin/gh/drisspg/223/head
2025-12-04T09:33:41.5029403Z  * [new branch]              gh/drisspg/223/orig         -> origin/gh/drisspg/223/orig
2025-12-04T09:33:41.5029662Z  * [new branch]              gh/drisspg/224/base         -> origin/gh/drisspg/224/base
2025-12-04T09:33:41.5029903Z  * [new branch]              gh/drisspg/224/head         -> origin/gh/drisspg/224/head
2025-12-04T09:33:41.5030170Z  * [new branch]              gh/drisspg/224/orig         -> origin/gh/drisspg/224/orig
2025-12-04T09:33:41.5030415Z  * [new branch]              gh/drisspg/225/base         -> origin/gh/drisspg/225/base
2025-12-04T09:33:41.5030658Z  * [new branch]              gh/drisspg/225/head         -> origin/gh/drisspg/225/head
2025-12-04T09:33:41.5030920Z  * [new branch]              gh/drisspg/225/orig         -> origin/gh/drisspg/225/orig
2025-12-04T09:33:41.5031166Z  * [new branch]              gh/drisspg/226/base         -> origin/gh/drisspg/226/base
2025-12-04T09:33:41.5031421Z  * [new branch]              gh/drisspg/226/head         -> origin/gh/drisspg/226/head
2025-12-04T09:33:41.5031668Z  * [new branch]              gh/drisspg/226/orig         -> origin/gh/drisspg/226/orig
2025-12-04T09:33:41.5031910Z  * [new branch]              gh/drisspg/227/base         -> origin/gh/drisspg/227/base
2025-12-04T09:33:41.5032170Z  * [new branch]              gh/drisspg/227/head         -> origin/gh/drisspg/227/head
2025-12-04T09:33:41.5032425Z  * [new branch]              gh/drisspg/227/orig         -> origin/gh/drisspg/227/orig
2025-12-04T09:33:41.5032686Z  * [new branch]              gh/drisspg/228/base         -> origin/gh/drisspg/228/base
2025-12-04T09:33:41.5032932Z  * [new branch]              gh/drisspg/228/head         -> origin/gh/drisspg/228/head
2025-12-04T09:33:41.5033175Z  * [new branch]              gh/drisspg/228/orig         -> origin/gh/drisspg/228/orig
2025-12-04T09:33:41.5033435Z  * [new branch]              gh/drisspg/229/base         -> origin/gh/drisspg/229/base
2025-12-04T09:33:41.5033678Z  * [new branch]              gh/drisspg/229/head         -> origin/gh/drisspg/229/head
2025-12-04T09:33:41.5034019Z  * [new branch]              gh/drisspg/229/orig         -> origin/gh/drisspg/229/orig
2025-12-04T09:33:41.5035704Z  * [new branch]              gh/drisspg/230/base         -> origin/gh/drisspg/230/base
2025-12-04T09:33:41.5036888Z  * [new branch]              gh/drisspg/230/head         -> origin/gh/drisspg/230/head
2025-12-04T09:33:41.5038167Z  * [new branch]              gh/drisspg/230/orig         -> origin/gh/drisspg/230/orig
2025-12-04T09:33:41.5040288Z  * [new branch]              gh/dsjohns2/1/base          -> origin/gh/dsjohns2/1/base
2025-12-04T09:33:41.5041632Z  * [new branch]              gh/dsjohns2/1/head          -> origin/gh/dsjohns2/1/head
2025-12-04T09:33:41.5044036Z  * [new branch]              gh/dzmitry-huba/1/base      -> origin/gh/dzmitry-huba/1/base
2025-12-04T09:33:41.5045383Z  * [new branch]              gh/dzmitry-huba/1/head      -> origin/gh/dzmitry-huba/1/head
2025-12-04T09:33:41.5047344Z  * [new branch]              gh/dzmitry-huba/12/base     -> origin/gh/dzmitry-huba/12/base
2025-12-04T09:33:41.5048737Z  * [new branch]              gh/dzmitry-huba/12/head     -> origin/gh/dzmitry-huba/12/head
2025-12-04T09:33:41.5050058Z  * [new branch]              gh/dzmitry-huba/12/orig     -> origin/gh/dzmitry-huba/12/orig
2025-12-04T09:33:41.5051950Z  * [new branch]              gh/dzmitry-huba/13/base     -> origin/gh/dzmitry-huba/13/base
2025-12-04T09:33:41.5053280Z  * [new branch]              gh/dzmitry-huba/13/head     -> origin/gh/dzmitry-huba/13/head
2025-12-04T09:33:41.5054558Z  * [new branch]              gh/dzmitry-huba/13/orig     -> origin/gh/dzmitry-huba/13/orig
2025-12-04T09:33:41.5056236Z  * [new branch]              gh/dzmitry-huba/14/base     -> origin/gh/dzmitry-huba/14/base
2025-12-04T09:33:41.5057534Z  * [new branch]              gh/dzmitry-huba/14/head     -> origin/gh/dzmitry-huba/14/head
2025-12-04T09:33:41.5058821Z  * [new branch]              gh/dzmitry-huba/14/orig     -> origin/gh/dzmitry-huba/14/orig
2025-12-04T09:33:41.5060708Z  * [new branch]              gh/dzmitry-huba/15/base     -> origin/gh/dzmitry-huba/15/base
2025-12-04T09:33:41.5061990Z  * [new branch]              gh/dzmitry-huba/15/head     -> origin/gh/dzmitry-huba/15/head
2025-12-04T09:33:41.5063178Z  * [new branch]              gh/dzmitry-huba/15/orig     -> origin/gh/dzmitry-huba/15/orig
2025-12-04T09:33:41.5065106Z  * [new branch]              gh/dzmitry-huba/16/base     -> origin/gh/dzmitry-huba/16/base
2025-12-04T09:33:41.5066513Z  * [new branch]              gh/dzmitry-huba/16/head     -> origin/gh/dzmitry-huba/16/head
2025-12-04T09:33:41.5067878Z  * [new branch]              gh/dzmitry-huba/16/orig     -> origin/gh/dzmitry-huba/16/orig
2025-12-04T09:33:41.5069587Z  * [new branch]              gh/dzmitry-huba/17/base     -> origin/gh/dzmitry-huba/17/base
2025-12-04T09:33:41.5070887Z  * [new branch]              gh/dzmitry-huba/17/head     -> origin/gh/dzmitry-huba/17/head
2025-12-04T09:33:41.5072198Z  * [new branch]              gh/dzmitry-huba/17/orig     -> origin/gh/dzmitry-huba/17/orig
2025-12-04T09:33:41.5073710Z  * [new branch]              gh/dzmitry-huba/2/base      -> origin/gh/dzmitry-huba/2/base
2025-12-04T09:33:41.5074901Z  * [new branch]              gh/dzmitry-huba/2/head      -> origin/gh/dzmitry-huba/2/head
2025-12-04T09:33:41.5076576Z  * [new branch]              gh/dzmitry-huba/3/base      -> origin/gh/dzmitry-huba/3/base
2025-12-04T09:33:41.5077749Z  * [new branch]              gh/dzmitry-huba/3/head      -> origin/gh/dzmitry-huba/3/head
2025-12-04T09:33:41.5079889Z  * [new branch]              gh/eellison/808/base        -> origin/gh/eellison/808/base
2025-12-04T09:33:41.5081265Z  * [new branch]              gh/eellison/808/head        -> origin/gh/eellison/808/head
2025-12-04T09:33:41.5082625Z  * [new branch]              gh/eellison/808/orig        -> origin/gh/eellison/808/orig
2025-12-04T09:33:41.5084744Z  * [new branch]              gh/eellison/822/base        -> origin/gh/eellison/822/base
2025-12-04T09:33:41.5086144Z  * [new branch]              gh/eellison/822/head        -> origin/gh/eellison/822/head
2025-12-04T09:33:41.5087373Z  * [new branch]              gh/eellison/822/orig        -> origin/gh/eellison/822/orig
2025-12-04T09:33:41.5089108Z  * [new branch]              gh/eellison/823/base        -> origin/gh/eellison/823/base
2025-12-04T09:33:41.5090408Z  * [new branch]              gh/eellison/823/head        -> origin/gh/eellison/823/head
2025-12-04T09:33:41.5091700Z  * [new branch]              gh/eellison/823/orig        -> origin/gh/eellison/823/orig
2025-12-04T09:33:41.5093476Z  * [new branch]              gh/eellison/862/base        -> origin/gh/eellison/862/base
2025-12-04T09:33:41.5094746Z  * [new branch]              gh/eellison/862/head        -> origin/gh/eellison/862/head
2025-12-04T09:33:41.5095990Z  * [new branch]              gh/eellison/862/orig        -> origin/gh/eellison/862/orig
2025-12-04T09:33:41.5097746Z  * [new branch]              gh/eellison/863/base        -> origin/gh/eellison/863/base
2025-12-04T09:33:41.5098991Z  * [new branch]              gh/eellison/863/head        -> origin/gh/eellison/863/head
2025-12-04T09:33:41.5100319Z  * [new branch]              gh/eellison/863/orig        -> origin/gh/eellison/863/orig
2025-12-04T09:33:41.5102215Z  * [new branch]              gh/eellison/864/base        -> origin/gh/eellison/864/base
2025-12-04T09:33:41.5103551Z  * [new branch]              gh/eellison/864/head        -> origin/gh/eellison/864/head
2025-12-04T09:33:41.5104886Z  * [new branch]              gh/eellison/864/orig        -> origin/gh/eellison/864/orig
2025-12-04T09:33:41.5106628Z  * [new branch]              gh/eellison/865/base        -> origin/gh/eellison/865/base
2025-12-04T09:33:41.5108683Z  * [new branch]              gh/eellison/865/head        -> origin/gh/eellison/865/head
2025-12-04T09:33:41.5109723Z  * [new branch]              gh/eellison/865/orig        -> origin/gh/eellison/865/orig
2025-12-04T09:33:41.5111506Z  * [new branch]              gh/eellison/866/base        -> origin/gh/eellison/866/base
2025-12-04T09:33:41.5112803Z  * [new branch]              gh/eellison/866/head        -> origin/gh/eellison/866/head
2025-12-04T09:33:41.5114304Z  * [new branch]              gh/eellison/866/orig        -> origin/gh/eellison/866/orig
2025-12-04T09:33:41.5115997Z  * [new branch]              gh/eellison/867/base        -> origin/gh/eellison/867/base
2025-12-04T09:33:41.5117240Z  * [new branch]              gh/eellison/867/head        -> origin/gh/eellison/867/head
2025-12-04T09:33:41.5118553Z  * [new branch]              gh/eellison/867/orig        -> origin/gh/eellison/867/orig
2025-12-04T09:33:41.5120431Z  * [new branch]              gh/eellison/868/base        -> origin/gh/eellison/868/base
2025-12-04T09:33:41.5122004Z  * [new branch]              gh/eellison/868/head        -> origin/gh/eellison/868/head
2025-12-04T09:33:41.5123454Z  * [new branch]              gh/eellison/868/orig        -> origin/gh/eellison/868/orig
2025-12-04T09:33:41.5125146Z  * [new branch]              gh/eellison/869/base        -> origin/gh/eellison/869/base
2025-12-04T09:33:41.5126390Z  * [new branch]              gh/eellison/869/head        -> origin/gh/eellison/869/head
2025-12-04T09:33:41.5128182Z  * [new branch]              gh/eellison/869/orig        -> origin/gh/eellison/869/orig
2025-12-04T09:33:41.5130073Z  * [new branch]              gh/eellison/870/base        -> origin/gh/eellison/870/base
2025-12-04T09:33:41.5131318Z  * [new branch]              gh/eellison/870/head        -> origin/gh/eellison/870/head
2025-12-04T09:33:41.5132556Z  * [new branch]              gh/eellison/870/orig        -> origin/gh/eellison/870/orig
2025-12-04T09:33:41.5134355Z  * [new branch]              gh/eellison/871/base        -> origin/gh/eellison/871/base
2025-12-04T09:33:41.5135549Z  * [new branch]              gh/eellison/871/head        -> origin/gh/eellison/871/head
2025-12-04T09:33:41.5136934Z  * [new branch]              gh/eellison/871/orig        -> origin/gh/eellison/871/orig
2025-12-04T09:33:41.5138733Z  * [new branch]              gh/eellison/872/base        -> origin/gh/eellison/872/base
2025-12-04T09:33:41.5139940Z  * [new branch]              gh/eellison/872/head        -> origin/gh/eellison/872/head
2025-12-04T09:33:41.5141226Z  * [new branch]              gh/eellison/872/orig        -> origin/gh/eellison/872/orig
2025-12-04T09:33:41.5143145Z  * [new branch]              gh/eellison/873/base        -> origin/gh/eellison/873/base
2025-12-04T09:33:41.5144459Z  * [new branch]              gh/eellison/873/head        -> origin/gh/eellison/873/head
2025-12-04T09:33:41.5145738Z  * [new branch]              gh/eellison/873/orig        -> origin/gh/eellison/873/orig
2025-12-04T09:33:41.5147581Z  * [new branch]              gh/eellison/874/base        -> origin/gh/eellison/874/base
2025-12-04T09:33:41.5149173Z  * [new branch]              gh/eellison/874/head        -> origin/gh/eellison/874/head
2025-12-04T09:33:41.5150464Z  * [new branch]              gh/eellison/874/orig        -> origin/gh/eellison/874/orig
2025-12-04T09:33:41.5152681Z  * [new branch]              gh/eellison/875/base        -> origin/gh/eellison/875/base
2025-12-04T09:33:41.5154111Z  * [new branch]              gh/eellison/875/head        -> origin/gh/eellison/875/head
2025-12-04T09:33:41.5155404Z  * [new branch]              gh/eellison/875/orig        -> origin/gh/eellison/875/orig
2025-12-04T09:33:41.5157204Z  * [new branch]              gh/eellison/876/base        -> origin/gh/eellison/876/base
2025-12-04T09:33:41.5158985Z  * [new branch]              gh/eellison/876/head        -> origin/gh/eellison/876/head
2025-12-04T09:33:41.5159798Z  * [new branch]              gh/eellison/876/orig        -> origin/gh/eellison/876/orig
2025-12-04T09:33:41.5161645Z  * [new branch]              gh/eellison/877/base        -> origin/gh/eellison/877/base
2025-12-04T09:33:41.5163058Z  * [new branch]              gh/eellison/877/head        -> origin/gh/eellison/877/head
2025-12-04T09:33:41.5164274Z  * [new branch]              gh/eellison/877/orig        -> origin/gh/eellison/877/orig
2025-12-04T09:33:41.5166222Z  * [new branch]              gh/eellison/878/base        -> origin/gh/eellison/878/base
2025-12-04T09:33:41.5167420Z  * [new branch]              gh/eellison/878/head        -> origin/gh/eellison/878/head
2025-12-04T09:33:41.5168687Z  * [new branch]              gh/eellison/878/orig        -> origin/gh/eellison/878/orig
2025-12-04T09:33:41.5170507Z  * [new branch]              gh/eellison/879/base        -> origin/gh/eellison/879/base
2025-12-04T09:33:41.5171800Z  * [new branch]              gh/eellison/879/head        -> origin/gh/eellison/879/head
2025-12-04T09:33:41.5173089Z  * [new branch]              gh/eellison/879/orig        -> origin/gh/eellison/879/orig
2025-12-04T09:33:41.5174656Z  * [new branch]              gh/eellison/880/base        -> origin/gh/eellison/880/base
2025-12-04T09:33:41.5176016Z  * [new branch]              gh/eellison/880/head        -> origin/gh/eellison/880/head
2025-12-04T09:33:41.5177306Z  * [new branch]              gh/eellison/880/orig        -> origin/gh/eellison/880/orig
2025-12-04T09:33:41.5179091Z  * [new branch]              gh/eellison/881/base        -> origin/gh/eellison/881/base
2025-12-04T09:33:41.5180396Z  * [new branch]              gh/eellison/881/head        -> origin/gh/eellison/881/head
2025-12-04T09:33:41.5181681Z  * [new branch]              gh/eellison/881/orig        -> origin/gh/eellison/881/orig
2025-12-04T09:33:41.5183534Z  * [new branch]              gh/eellison/882/base        -> origin/gh/eellison/882/base
2025-12-04T09:33:41.5184815Z  * [new branch]              gh/eellison/882/head        -> origin/gh/eellison/882/head
2025-12-04T09:33:41.5186409Z  * [new branch]              gh/eellison/882/orig        -> origin/gh/eellison/882/orig
2025-12-04T09:33:41.5188115Z  * [new branch]              gh/eellison/883/base        -> origin/gh/eellison/883/base
2025-12-04T09:33:41.5189384Z  * [new branch]              gh/eellison/883/head        -> origin/gh/eellison/883/head
2025-12-04T09:33:41.5190692Z  * [new branch]              gh/eellison/883/orig        -> origin/gh/eellison/883/orig
2025-12-04T09:33:41.5192262Z  * [new branch]              gh/eellison/884/base        -> origin/gh/eellison/884/base
2025-12-04T09:33:41.5193586Z  * [new branch]              gh/eellison/884/head        -> origin/gh/eellison/884/head
2025-12-04T09:33:41.5194769Z  * [new branch]              gh/eellison/884/orig        -> origin/gh/eellison/884/orig
2025-12-04T09:33:41.5196873Z  * [new branch]              gh/etaf/147/base            -> origin/gh/etaf/147/base
2025-12-04T09:33:41.5198165Z  * [new branch]              gh/etaf/147/head            -> origin/gh/etaf/147/head
2025-12-04T09:33:41.5200180Z  * [new branch]              gh/etaf/154/base            -> origin/gh/etaf/154/base
2025-12-04T09:33:41.5205025Z  * [new branch]              gh/etaf/154/head            -> origin/gh/etaf/154/head
2025-12-04T09:33:41.5206517Z  * [new branch]              gh/etaf/154/orig            -> origin/gh/etaf/154/orig
2025-12-04T09:33:41.5208218Z  * [new branch]              gh/etaf/156/base            -> origin/gh/etaf/156/base
2025-12-04T09:33:41.5209538Z  * [new branch]              gh/etaf/156/head            -> origin/gh/etaf/156/head
2025-12-04T09:33:41.5210888Z  * [new branch]              gh/etaf/156/orig            -> origin/gh/etaf/156/orig
2025-12-04T09:33:41.5212782Z  * [new branch]              gh/etaf/157/base            -> origin/gh/etaf/157/base
2025-12-04T09:33:41.5214103Z  * [new branch]              gh/etaf/157/head            -> origin/gh/etaf/157/head
2025-12-04T09:33:41.5215450Z  * [new branch]              gh/etaf/157/orig            -> origin/gh/etaf/157/orig
2025-12-04T09:33:41.5217127Z  * [new branch]              gh/etaf/158/base            -> origin/gh/etaf/158/base
2025-12-04T09:33:41.5218504Z  * [new branch]              gh/etaf/158/head            -> origin/gh/etaf/158/head
2025-12-04T09:33:41.5219792Z  * [new branch]              gh/etaf/158/orig            -> origin/gh/etaf/158/orig
2025-12-04T09:33:41.5221734Z  * [new branch]              gh/etaf/159/base            -> origin/gh/etaf/159/base
2025-12-04T09:33:41.5223031Z  * [new branch]              gh/etaf/159/head            -> origin/gh/etaf/159/head
2025-12-04T09:33:41.5224301Z  * [new branch]              gh/etaf/159/orig            -> origin/gh/etaf/159/orig
2025-12-04T09:33:41.5226210Z  * [new branch]              gh/etaf/160/base            -> origin/gh/etaf/160/base
2025-12-04T09:33:41.5227534Z  * [new branch]              gh/etaf/160/head            -> origin/gh/etaf/160/head
2025-12-04T09:33:41.5228901Z  * [new branch]              gh/etaf/160/orig            -> origin/gh/etaf/160/orig
2025-12-04T09:33:41.5230623Z  * [new branch]              gh/etaf/161/base            -> origin/gh/etaf/161/base
2025-12-04T09:33:41.5232015Z  * [new branch]              gh/etaf/161/head            -> origin/gh/etaf/161/head
2025-12-04T09:33:41.5233374Z  * [new branch]              gh/etaf/161/orig            -> origin/gh/etaf/161/orig
2025-12-04T09:33:41.5235114Z  * [new branch]              gh/etaf/166/base            -> origin/gh/etaf/166/base
2025-12-04T09:33:41.5236603Z  * [new branch]              gh/etaf/166/head            -> origin/gh/etaf/166/head
2025-12-04T09:33:41.5237840Z  * [new branch]              gh/etaf/166/orig            -> origin/gh/etaf/166/orig
2025-12-04T09:33:41.5239603Z  * [new branch]              gh/etaf/167/base            -> origin/gh/etaf/167/base
2025-12-04T09:33:41.5240894Z  * [new branch]              gh/etaf/167/head            -> origin/gh/etaf/167/head
2025-12-04T09:33:41.5242246Z  * [new branch]              gh/etaf/167/orig            -> origin/gh/etaf/167/orig
2025-12-04T09:33:41.5244212Z  * [new branch]              gh/etaf/168/base            -> origin/gh/etaf/168/base
2025-12-04T09:33:41.5245589Z  * [new branch]              gh/etaf/168/head            -> origin/gh/etaf/168/head
2025-12-04T09:33:41.5246937Z  * [new branch]              gh/etaf/168/orig            -> origin/gh/etaf/168/orig
2025-12-04T09:33:41.5248792Z  * [new branch]              gh/etaf/172/base            -> origin/gh/etaf/172/base
2025-12-04T09:33:41.5250045Z  * [new branch]              gh/etaf/172/head            -> origin/gh/etaf/172/head
2025-12-04T09:33:41.5251350Z  * [new branch]              gh/etaf/172/orig            -> origin/gh/etaf/172/orig
2025-12-04T09:33:41.5253311Z  * [new branch]              gh/etaf/173/base            -> origin/gh/etaf/173/base
2025-12-04T09:33:41.5254737Z  * [new branch]              gh/etaf/173/head            -> origin/gh/etaf/173/head
2025-12-04T09:33:41.5256019Z  * [new branch]              gh/etaf/173/orig            -> origin/gh/etaf/173/orig
2025-12-04T09:33:41.5257975Z  * [new branch]              gh/etaf/174/base            -> origin/gh/etaf/174/base
2025-12-04T09:33:41.5259169Z  * [new branch]              gh/etaf/174/head            -> origin/gh/etaf/174/head
2025-12-04T09:33:41.5260916Z  * [new branch]              gh/etaf/175/base            -> origin/gh/etaf/175/base
2025-12-04T09:33:41.5262206Z  * [new branch]              gh/etaf/175/head            -> origin/gh/etaf/175/head
2025-12-04T09:33:41.5263374Z  * [new branch]              gh/etaf/175/orig            -> origin/gh/etaf/175/orig
2025-12-04T09:33:41.5265227Z  * [new branch]              gh/etaf/176/base            -> origin/gh/etaf/176/base
2025-12-04T09:33:41.5266619Z  * [new branch]              gh/etaf/176/head            -> origin/gh/etaf/176/head
2025-12-04T09:33:41.5267920Z  * [new branch]              gh/etaf/176/orig            -> origin/gh/etaf/176/orig
2025-12-04T09:33:41.5270072Z  * [new branch]              gh/etaf/177/base            -> origin/gh/etaf/177/base
2025-12-04T09:33:41.5271622Z  * [new branch]              gh/etaf/177/head            -> origin/gh/etaf/177/head
2025-12-04T09:33:41.5272929Z  * [new branch]              gh/etaf/177/orig            -> origin/gh/etaf/177/orig
2025-12-04T09:33:41.5274912Z  * [new branch]              gh/etaf/178/base            -> origin/gh/etaf/178/base
2025-12-04T09:33:41.5276488Z  * [new branch]              gh/etaf/178/head            -> origin/gh/etaf/178/head
2025-12-04T09:33:41.5277786Z  * [new branch]              gh/etaf/178/orig            -> origin/gh/etaf/178/orig
2025-12-04T09:33:41.5279560Z  * [new branch]              gh/etaf/179/base            -> origin/gh/etaf/179/base
2025-12-04T09:33:41.5280890Z  * [new branch]              gh/etaf/179/head            -> origin/gh/etaf/179/head
2025-12-04T09:33:41.5282168Z  * [new branch]              gh/etaf/179/orig            -> origin/gh/etaf/179/orig
2025-12-04T09:33:41.5283972Z  * [new branch]              gh/etaf/180/base            -> origin/gh/etaf/180/base
2025-12-04T09:33:41.5285219Z  * [new branch]              gh/etaf/180/head            -> origin/gh/etaf/180/head
2025-12-04T09:33:41.5286520Z  * [new branch]              gh/etaf/180/orig            -> origin/gh/etaf/180/orig
2025-12-04T09:33:41.5288577Z  * [new branch]              gh/exclamaforte/1/base      -> origin/gh/exclamaforte/1/base
2025-12-04T09:33:41.5290002Z  * [new branch]              gh/exclamaforte/1/head      -> origin/gh/exclamaforte/1/head
2025-12-04T09:33:41.5291535Z  * [new branch]              gh/exclamaforte/2/base      -> origin/gh/exclamaforte/2/base
2025-12-04T09:33:41.5292821Z  * [new branch]              gh/exclamaforte/2/head      -> origin/gh/exclamaforte/2/head
2025-12-04T09:33:41.5294688Z  * [new branch]              gh/exclamaforte/3/base      -> origin/gh/exclamaforte/3/base
2025-12-04T09:33:41.5295793Z  * [new branch]              gh/exclamaforte/3/head      -> origin/gh/exclamaforte/3/head
2025-12-04T09:33:41.5297487Z  * [new branch]              gh/exclamaforte/4/base      -> origin/gh/exclamaforte/4/base
2025-12-04T09:33:41.5298735Z  * [new branch]              gh/exclamaforte/4/head      -> origin/gh/exclamaforte/4/head
2025-12-04T09:33:41.5300992Z  * [new branch]              gh/ezyang/2374/base         -> origin/gh/ezyang/2374/base
2025-12-04T09:33:41.5302439Z  * [new branch]              gh/ezyang/2374/head         -> origin/gh/ezyang/2374/head
2025-12-04T09:33:41.5303858Z  * [new branch]              gh/ezyang/2374/orig         -> origin/gh/ezyang/2374/orig
2025-12-04T09:33:41.5305459Z  * [new branch]              gh/ezyang/2973/base         -> origin/gh/ezyang/2973/base
2025-12-04T09:33:41.5306661Z  * [new branch]              gh/ezyang/2973/head         -> origin/gh/ezyang/2973/head
2025-12-04T09:33:41.5308007Z  * [new branch]              gh/ezyang/2973/orig         -> origin/gh/ezyang/2973/orig
2025-12-04T09:33:41.5309761Z  * [new branch]              gh/ezyang/2974/base         -> origin/gh/ezyang/2974/base
2025-12-04T09:33:41.5311033Z  * [new branch]              gh/ezyang/2974/head         -> origin/gh/ezyang/2974/head
2025-12-04T09:33:41.5312341Z  * [new branch]              gh/ezyang/2974/orig         -> origin/gh/ezyang/2974/orig
2025-12-04T09:33:41.5314017Z  * [new branch]              gh/ezyang/3131/base         -> origin/gh/ezyang/3131/base
2025-12-04T09:33:41.5315315Z  * [new branch]              gh/ezyang/3131/head         -> origin/gh/ezyang/3131/head
2025-12-04T09:33:41.5316570Z  * [new branch]              gh/ezyang/3131/orig         -> origin/gh/ezyang/3131/orig
2025-12-04T09:33:41.5318241Z  * [new branch]              gh/ezyang/3139/base         -> origin/gh/ezyang/3139/base
2025-12-04T09:33:41.5319469Z  * [new branch]              gh/ezyang/3139/head         -> origin/gh/ezyang/3139/head
2025-12-04T09:33:41.5320776Z  * [new branch]              gh/ezyang/3139/orig         -> origin/gh/ezyang/3139/orig
2025-12-04T09:33:41.5322460Z  * [new branch]              gh/ezyang/3140/base         -> origin/gh/ezyang/3140/base
2025-12-04T09:33:41.5324277Z  * [new branch]              gh/ezyang/3140/head         -> origin/gh/ezyang/3140/head
2025-12-04T09:33:41.5325635Z  * [new branch]              gh/ezyang/3140/orig         -> origin/gh/ezyang/3140/orig
2025-12-04T09:33:41.5327465Z  * [new branch]              gh/ezyang/3143/base         -> origin/gh/ezyang/3143/base
2025-12-04T09:33:41.5328760Z  * [new branch]              gh/ezyang/3143/head         -> origin/gh/ezyang/3143/head
2025-12-04T09:33:41.5330539Z  * [new branch]              gh/ezyang/3143/orig         -> origin/gh/ezyang/3143/orig
2025-12-04T09:33:41.5332265Z  * [new branch]              gh/ezyang/3144/base         -> origin/gh/ezyang/3144/base
2025-12-04T09:33:41.5333597Z  * [new branch]              gh/ezyang/3144/head         -> origin/gh/ezyang/3144/head
2025-12-04T09:33:41.5334860Z  * [new branch]              gh/ezyang/3144/orig         -> origin/gh/ezyang/3144/orig
2025-12-04T09:33:41.5336583Z  * [new branch]              gh/ezyang/3167/base         -> origin/gh/ezyang/3167/base
2025-12-04T09:33:41.5337852Z  * [new branch]              gh/ezyang/3167/head         -> origin/gh/ezyang/3167/head
2025-12-04T09:33:41.5339203Z  * [new branch]              gh/ezyang/3167/orig         -> origin/gh/ezyang/3167/orig
2025-12-04T09:33:41.5340896Z  * [new branch]              gh/ezyang/3173/base         -> origin/gh/ezyang/3173/base
2025-12-04T09:33:41.5342136Z  * [new branch]              gh/ezyang/3173/head         -> origin/gh/ezyang/3173/head
2025-12-04T09:33:41.5343491Z  * [new branch]              gh/ezyang/3173/orig         -> origin/gh/ezyang/3173/orig
2025-12-04T09:33:41.5345255Z  * [new branch]              gh/ezyang/3175/base         -> origin/gh/ezyang/3175/base
2025-12-04T09:33:41.5346527Z  * [new branch]              gh/ezyang/3175/head         -> origin/gh/ezyang/3175/head
2025-12-04T09:33:41.5347793Z  * [new branch]              gh/ezyang/3175/orig         -> origin/gh/ezyang/3175/orig
2025-12-04T09:33:41.5349459Z  * [new branch]              gh/ezyang/3182/base         -> origin/gh/ezyang/3182/base
2025-12-04T09:33:41.5350761Z  * [new branch]              gh/ezyang/3182/head         -> origin/gh/ezyang/3182/head
2025-12-04T09:33:41.5351994Z  * [new branch]              gh/ezyang/3182/orig         -> origin/gh/ezyang/3182/orig
2025-12-04T09:33:41.5353686Z  * [new branch]              gh/ezyang/3185/base         -> origin/gh/ezyang/3185/base
2025-12-04T09:33:41.5355018Z  * [new branch]              gh/ezyang/3185/head         -> origin/gh/ezyang/3185/head
2025-12-04T09:33:41.5356218Z  * [new branch]              gh/ezyang/3185/orig         -> origin/gh/ezyang/3185/orig
2025-12-04T09:33:41.5357845Z  * [new branch]              gh/ezyang/3189/base         -> origin/gh/ezyang/3189/base
2025-12-04T09:33:41.5359147Z  * [new branch]              gh/ezyang/3189/head         -> origin/gh/ezyang/3189/head
2025-12-04T09:33:41.5360425Z  * [new branch]              gh/ezyang/3189/orig         -> origin/gh/ezyang/3189/orig
2025-12-04T09:33:41.5362238Z  * [new branch]              gh/ezyang/3191/base         -> origin/gh/ezyang/3191/base
2025-12-04T09:33:41.5363588Z  * [new branch]              gh/ezyang/3191/head         -> origin/gh/ezyang/3191/head
2025-12-04T09:33:41.5364882Z  * [new branch]              gh/ezyang/3191/orig         -> origin/gh/ezyang/3191/orig
2025-12-04T09:33:41.5367080Z  * [new branch]              gh/ezyang/3192/base         -> origin/gh/ezyang/3192/base
2025-12-04T09:33:41.5368371Z  * [new branch]              gh/ezyang/3192/head         -> origin/gh/ezyang/3192/head
2025-12-04T09:33:41.5369755Z  * [new branch]              gh/ezyang/3192/orig         -> origin/gh/ezyang/3192/orig
2025-12-04T09:33:41.5371497Z  * [new branch]              gh/ezyang/3193/base         -> origin/gh/ezyang/3193/base
2025-12-04T09:33:41.5372799Z  * [new branch]              gh/ezyang/3193/head         -> origin/gh/ezyang/3193/head
2025-12-04T09:33:41.5374112Z  * [new branch]              gh/ezyang/3193/orig         -> origin/gh/ezyang/3193/orig
2025-12-04T09:33:41.5375840Z  * [new branch]              gh/ezyang/3194/base         -> origin/gh/ezyang/3194/base
2025-12-04T09:33:41.5377110Z  * [new branch]              gh/ezyang/3194/head         -> origin/gh/ezyang/3194/head
2025-12-04T09:33:41.5378375Z  * [new branch]              gh/ezyang/3194/orig         -> origin/gh/ezyang/3194/orig
2025-12-04T09:33:41.5380171Z  * [new branch]              gh/ezyang/3195/base         -> origin/gh/ezyang/3195/base
2025-12-04T09:33:41.5381743Z  * [new branch]              gh/ezyang/3195/head         -> origin/gh/ezyang/3195/head
2025-12-04T09:33:41.5383046Z  * [new branch]              gh/ezyang/3195/orig         -> origin/gh/ezyang/3195/orig
2025-12-04T09:33:41.5384814Z  * [new branch]              gh/ezyang/3196/base         -> origin/gh/ezyang/3196/base
2025-12-04T09:33:41.5386058Z  * [new branch]              gh/ezyang/3196/head         -> origin/gh/ezyang/3196/head
2025-12-04T09:33:41.5387385Z  * [new branch]              gh/ezyang/3196/orig         -> origin/gh/ezyang/3196/orig
2025-12-04T09:33:41.5389072Z  * [new branch]              gh/ezyang/3197/base         -> origin/gh/ezyang/3197/base
2025-12-04T09:33:41.5390384Z  * [new branch]              gh/ezyang/3197/head         -> origin/gh/ezyang/3197/head
2025-12-04T09:33:41.5391676Z  * [new branch]              gh/ezyang/3197/orig         -> origin/gh/ezyang/3197/orig
2025-12-04T09:33:41.5393360Z  * [new branch]              gh/ezyang/3198/base         -> origin/gh/ezyang/3198/base
2025-12-04T09:33:41.5394649Z  * [new branch]              gh/ezyang/3198/head         -> origin/gh/ezyang/3198/head
2025-12-04T09:33:41.5395950Z  * [new branch]              gh/ezyang/3198/orig         -> origin/gh/ezyang/3198/orig
2025-12-04T09:33:41.5397795Z  * [new branch]              gh/ezyang/3199/base         -> origin/gh/ezyang/3199/base
2025-12-04T09:33:41.5399026Z  * [new branch]              gh/ezyang/3199/head         -> origin/gh/ezyang/3199/head
2025-12-04T09:33:41.5400377Z  * [new branch]              gh/ezyang/3199/orig         -> origin/gh/ezyang/3199/orig
2025-12-04T09:33:41.5402428Z  * [new branch]              gh/ezyang/3200/base         -> origin/gh/ezyang/3200/base
2025-12-04T09:33:41.5403799Z  * [new branch]              gh/ezyang/3200/head         -> origin/gh/ezyang/3200/head
2025-12-04T09:33:41.5405098Z  * [new branch]              gh/ezyang/3200/orig         -> origin/gh/ezyang/3200/orig
2025-12-04T09:33:41.5406815Z  * [new branch]              gh/ezyang/3201/base         -> origin/gh/ezyang/3201/base
2025-12-04T09:33:41.5408173Z  * [new branch]              gh/ezyang/3201/head         -> origin/gh/ezyang/3201/head
2025-12-04T09:33:41.5409325Z  * [new branch]              gh/ezyang/3201/orig         -> origin/gh/ezyang/3201/orig
2025-12-04T09:33:41.5411088Z  * [new branch]              gh/ezyang/3202/base         -> origin/gh/ezyang/3202/base
2025-12-04T09:33:41.5412574Z  * [new branch]              gh/ezyang/3202/head         -> origin/gh/ezyang/3202/head
2025-12-04T09:33:41.5413612Z  * [new branch]              gh/ezyang/3202/orig         -> origin/gh/ezyang/3202/orig
2025-12-04T09:33:41.5415460Z  * [new branch]              gh/ezyang/3203/base         -> origin/gh/ezyang/3203/base
2025-12-04T09:33:41.5416722Z  * [new branch]              gh/ezyang/3203/head         -> origin/gh/ezyang/3203/head
2025-12-04T09:33:41.5418218Z  * [new branch]              gh/ezyang/3203/orig         -> origin/gh/ezyang/3203/orig
2025-12-04T09:33:41.5419924Z  * [new branch]              gh/ezyang/3204/base         -> origin/gh/ezyang/3204/base
2025-12-04T09:33:41.5421231Z  * [new branch]              gh/ezyang/3204/head         -> origin/gh/ezyang/3204/head
2025-12-04T09:33:41.5422496Z  * [new branch]              gh/ezyang/3204/orig         -> origin/gh/ezyang/3204/orig
2025-12-04T09:33:41.5424259Z  * [new branch]              gh/ezyang/3205/base         -> origin/gh/ezyang/3205/base
2025-12-04T09:33:41.5425549Z  * [new branch]              gh/ezyang/3205/head         -> origin/gh/ezyang/3205/head
2025-12-04T09:33:41.5426848Z  * [new branch]              gh/ezyang/3205/orig         -> origin/gh/ezyang/3205/orig
2025-12-04T09:33:41.5428503Z  * [new branch]              gh/ezyang/3206/base         -> origin/gh/ezyang/3206/base
2025-12-04T09:33:41.5429759Z  * [new branch]              gh/ezyang/3206/head         -> origin/gh/ezyang/3206/head
2025-12-04T09:33:41.5431095Z  * [new branch]              gh/ezyang/3206/orig         -> origin/gh/ezyang/3206/orig
2025-12-04T09:33:41.5432868Z  * [new branch]              gh/ezyang/3207/base         -> origin/gh/ezyang/3207/base
2025-12-04T09:33:41.5434144Z  * [new branch]              gh/ezyang/3207/head         -> origin/gh/ezyang/3207/head
2025-12-04T09:33:41.5435433Z  * [new branch]              gh/ezyang/3207/orig         -> origin/gh/ezyang/3207/orig
2025-12-04T09:33:41.5437128Z  * [new branch]              gh/ezyang/3208/base         -> origin/gh/ezyang/3208/base
2025-12-04T09:33:41.5438406Z  * [new branch]              gh/ezyang/3208/head         -> origin/gh/ezyang/3208/head
2025-12-04T09:33:41.5439676Z  * [new branch]              gh/ezyang/3208/orig         -> origin/gh/ezyang/3208/orig
2025-12-04T09:33:41.5441398Z  * [new branch]              gh/ezyang/3209/base         -> origin/gh/ezyang/3209/base
2025-12-04T09:33:41.5442747Z  * [new branch]              gh/ezyang/3209/head         -> origin/gh/ezyang/3209/head
2025-12-04T09:33:41.5444093Z  * [new branch]              gh/ezyang/3209/orig         -> origin/gh/ezyang/3209/orig
2025-12-04T09:33:41.5446080Z  * [new branch]              gh/fadara01/3/base          -> origin/gh/fadara01/3/base
2025-12-04T09:33:41.5447336Z  * [new branch]              gh/fadara01/3/head          -> origin/gh/fadara01/3/head
2025-12-04T09:33:41.5448751Z  * [new branch]              gh/fadara01/3/orig          -> origin/gh/fadara01/3/orig
2025-12-04T09:33:41.5450465Z  * [new branch]              gh/fadara01/5/base          -> origin/gh/fadara01/5/base
2025-12-04T09:33:41.5451781Z  * [new branch]              gh/fadara01/5/head          -> origin/gh/fadara01/5/head
2025-12-04T09:33:41.5453032Z  * [new branch]              gh/fadara01/5/orig          -> origin/gh/fadara01/5/orig
2025-12-04T09:33:41.5454699Z  * [new branch]              gh/fadara01/6/base          -> origin/gh/fadara01/6/base
2025-12-04T09:33:41.5455951Z  * [new branch]              gh/fadara01/6/head          -> origin/gh/fadara01/6/head
2025-12-04T09:33:41.5457233Z  * [new branch]              gh/fadara01/6/orig          -> origin/gh/fadara01/6/orig
2025-12-04T09:33:41.5459025Z  * [new branch]              gh/fadara01/7/base          -> origin/gh/fadara01/7/base
2025-12-04T09:33:41.5460223Z  * [new branch]              gh/fadara01/7/head          -> origin/gh/fadara01/7/head
2025-12-04T09:33:41.5461575Z  * [new branch]              gh/fadara01/7/orig          -> origin/gh/fadara01/7/orig
2025-12-04T09:33:41.5463278Z  * [new branch]              gh/fadara01/8/base          -> origin/gh/fadara01/8/base
2025-12-04T09:33:41.5464570Z  * [new branch]              gh/fadara01/8/head          -> origin/gh/fadara01/8/head
2025-12-04T09:33:41.5465909Z  * [new branch]              gh/fadara01/8/orig          -> origin/gh/fadara01/8/orig
2025-12-04T09:33:41.5467610Z  * [new branch]              gh/fadara01/9/base          -> origin/gh/fadara01/9/base
2025-12-04T09:33:41.5468866Z  * [new branch]              gh/fadara01/9/head          -> origin/gh/fadara01/9/head
2025-12-04T09:33:41.5470181Z  * [new branch]              gh/fadara01/9/orig          -> origin/gh/fadara01/9/orig
2025-12-04T09:33:41.5472201Z  * [new branch]              gh/fduwjj/182/base          -> origin/gh/fduwjj/182/base
2025-12-04T09:33:41.5473457Z  * [new branch]              gh/fduwjj/182/head          -> origin/gh/fduwjj/182/head
2025-12-04T09:33:41.5474729Z  * [new branch]              gh/fduwjj/182/orig          -> origin/gh/fduwjj/182/orig
2025-12-04T09:33:41.5476529Z  * [new branch]              gh/fduwjj/211/base          -> origin/gh/fduwjj/211/base
2025-12-04T09:33:41.5477806Z  * [new branch]              gh/fduwjj/211/head          -> origin/gh/fduwjj/211/head
2025-12-04T09:33:41.5479061Z  * [new branch]              gh/fduwjj/211/orig          -> origin/gh/fduwjj/211/orig
2025-12-04T09:33:41.5480766Z  * [new branch]              gh/fduwjj/212/base          -> origin/gh/fduwjj/212/base
2025-12-04T09:33:41.5482131Z  * [new branch]              gh/fduwjj/212/head          -> origin/gh/fduwjj/212/head
2025-12-04T09:33:41.5483548Z  * [new branch]              gh/fduwjj/212/orig          -> origin/gh/fduwjj/212/orig
2025-12-04T09:33:41.5485265Z  * [new branch]              gh/fduwjj/213/base          -> origin/gh/fduwjj/213/base
2025-12-04T09:33:41.5486575Z  * [new branch]              gh/fduwjj/213/head          -> origin/gh/fduwjj/213/head
2025-12-04T09:33:41.5487832Z  * [new branch]              gh/fduwjj/213/orig          -> origin/gh/fduwjj/213/orig
2025-12-04T09:33:41.5489632Z  * [new branch]              gh/fduwjj/226/base          -> origin/gh/fduwjj/226/base
2025-12-04T09:33:41.5490828Z  * [new branch]              gh/fduwjj/226/head          -> origin/gh/fduwjj/226/head
2025-12-04T09:33:41.5492119Z  * [new branch]              gh/fduwjj/226/orig          -> origin/gh/fduwjj/226/orig
2025-12-04T09:33:41.5493953Z  * [new branch]              gh/fduwjj/229/base          -> origin/gh/fduwjj/229/base
2025-12-04T09:33:41.5495203Z  * [new branch]              gh/fduwjj/229/head          -> origin/gh/fduwjj/229/head
2025-12-04T09:33:41.5496500Z  * [new branch]              gh/fduwjj/229/orig          -> origin/gh/fduwjj/229/orig
2025-12-04T09:33:41.5498226Z  * [new branch]              gh/fduwjj/233/base          -> origin/gh/fduwjj/233/base
2025-12-04T09:33:41.5499586Z  * [new branch]              gh/fduwjj/233/head          -> origin/gh/fduwjj/233/head
2025-12-04T09:33:41.5500997Z  * [new branch]              gh/fduwjj/233/orig          -> origin/gh/fduwjj/233/orig
2025-12-04T09:33:41.5502929Z  * [new branch]              gh/fduwjj/234/base          -> origin/gh/fduwjj/234/base
2025-12-04T09:33:41.5504164Z  * [new branch]              gh/fduwjj/234/head          -> origin/gh/fduwjj/234/head
2025-12-04T09:33:41.5505439Z  * [new branch]              gh/fduwjj/234/orig          -> origin/gh/fduwjj/234/orig
2025-12-04T09:33:41.5507133Z  * [new branch]              gh/fduwjj/235/base          -> origin/gh/fduwjj/235/base
2025-12-04T09:33:41.5508505Z  * [new branch]              gh/fduwjj/235/head          -> origin/gh/fduwjj/235/head
2025-12-04T09:33:41.5509802Z  * [new branch]              gh/fduwjj/235/orig          -> origin/gh/fduwjj/235/orig
2025-12-04T09:33:41.5511489Z  * [new branch]              gh/fduwjj/236/base          -> origin/gh/fduwjj/236/base
2025-12-04T09:33:41.5512701Z  * [new branch]              gh/fduwjj/236/head          -> origin/gh/fduwjj/236/head
2025-12-04T09:33:41.5513933Z  * [new branch]              gh/fduwjj/236/orig          -> origin/gh/fduwjj/236/orig
2025-12-04T09:33:41.5515474Z  * [new branch]              gh/fduwjj/237/base          -> origin/gh/fduwjj/237/base
2025-12-04T09:33:41.5516874Z  * [new branch]              gh/fduwjj/237/head          -> origin/gh/fduwjj/237/head
2025-12-04T09:33:41.5518126Z  * [new branch]              gh/fduwjj/237/orig          -> origin/gh/fduwjj/237/orig
2025-12-04T09:33:41.5519850Z  * [new branch]              gh/fduwjj/238/base          -> origin/gh/fduwjj/238/base
2025-12-04T09:33:41.5521202Z  * [new branch]              gh/fduwjj/238/head          -> origin/gh/fduwjj/238/head
2025-12-04T09:33:41.5522538Z  * [new branch]              gh/fduwjj/238/orig          -> origin/gh/fduwjj/238/orig
2025-12-04T09:33:41.5524343Z  * [new branch]              gh/fduwjj/239/base          -> origin/gh/fduwjj/239/base
2025-12-04T09:33:41.5525778Z  * [new branch]              gh/fduwjj/239/head          -> origin/gh/fduwjj/239/head
2025-12-04T09:33:41.5526998Z  * [new branch]              gh/fduwjj/239/orig          -> origin/gh/fduwjj/239/orig
2025-12-04T09:33:41.5529014Z  * [new branch]              gh/fegin/332/base           -> origin/gh/fegin/332/base
2025-12-04T09:33:41.5530903Z  * [new branch]              gh/fegin/332/head           -> origin/gh/fegin/332/head
2025-12-04T09:33:41.5532251Z  * [new branch]              gh/fegin/332/orig           -> origin/gh/fegin/332/orig
2025-12-04T09:33:41.5534109Z  * [new branch]              gh/fegin/333/base           -> origin/gh/fegin/333/base
2025-12-04T09:33:41.5535374Z  * [new branch]              gh/fegin/333/head           -> origin/gh/fegin/333/head
2025-12-04T09:33:41.5536715Z  * [new branch]              gh/fegin/333/orig           -> origin/gh/fegin/333/orig
2025-12-04T09:33:41.5538404Z  * [new branch]              gh/fegin/334/base           -> origin/gh/fegin/334/base
2025-12-04T09:33:41.5539680Z  * [new branch]              gh/fegin/334/head           -> origin/gh/fegin/334/head
2025-12-04T09:33:41.5541119Z  * [new branch]              gh/fegin/334/orig           -> origin/gh/fegin/334/orig
2025-12-04T09:33:41.5542823Z  * [new branch]              gh/fegin/335/base           -> origin/gh/fegin/335/base
2025-12-04T09:33:41.5544081Z  * [new branch]              gh/fegin/335/head           -> origin/gh/fegin/335/head
2025-12-04T09:33:41.5545327Z  * [new branch]              gh/fegin/335/orig           -> origin/gh/fegin/335/orig
2025-12-04T09:33:41.5547342Z  * [new branch]              gh/fffrog/160/base          -> origin/gh/fffrog/160/base
2025-12-04T09:33:41.5548604Z  * [new branch]              gh/fffrog/160/head          -> origin/gh/fffrog/160/head
2025-12-04T09:33:41.5550897Z  * [new branch]              gh/fffrog/177/base          -> origin/gh/fffrog/177/base
2025-12-04T09:33:41.5552176Z  * [new branch]              gh/fffrog/177/head          -> origin/gh/fffrog/177/head
2025-12-04T09:33:41.5553515Z  * [new branch]              gh/fffrog/177/orig          -> origin/gh/fffrog/177/orig
2025-12-04T09:33:41.5555256Z  * [new branch]              gh/fffrog/178/base          -> origin/gh/fffrog/178/base
2025-12-04T09:33:41.5556503Z  * [new branch]              gh/fffrog/178/head          -> origin/gh/fffrog/178/head
2025-12-04T09:33:41.5557799Z  * [new branch]              gh/fffrog/178/orig          -> origin/gh/fffrog/178/orig
2025-12-04T09:33:41.5559442Z  * [new branch]              gh/fffrog/181/base          -> origin/gh/fffrog/181/base
2025-12-04T09:33:41.5560748Z  * [new branch]              gh/fffrog/181/head          -> origin/gh/fffrog/181/head
2025-12-04T09:33:41.5562139Z  * [new branch]              gh/fffrog/181/orig          -> origin/gh/fffrog/181/orig
2025-12-04T09:33:41.5564015Z  * [new branch]              gh/fffrog/183/base          -> origin/gh/fffrog/183/base
2025-12-04T09:33:41.5565163Z  * [new branch]              gh/fffrog/183/head          -> origin/gh/fffrog/183/head
2025-12-04T09:33:41.5566396Z  * [new branch]              gh/fffrog/183/orig          -> origin/gh/fffrog/183/orig
2025-12-04T09:33:41.5568624Z  * [new branch]              gh/fxdawnn/10/base          -> origin/gh/fxdawnn/10/base
2025-12-04T09:33:41.5569838Z  * [new branch]              gh/fxdawnn/10/head          -> origin/gh/fxdawnn/10/head
2025-12-04T09:33:41.5571108Z  * [new branch]              gh/fxdawnn/10/orig          -> origin/gh/fxdawnn/10/orig
2025-12-04T09:33:41.5573167Z  * [new branch]              gh/fxdawnn/11/base          -> origin/gh/fxdawnn/11/base
2025-12-04T09:33:41.5574179Z  * [new branch]              gh/fxdawnn/11/head          -> origin/gh/fxdawnn/11/head
2025-12-04T09:33:41.5575499Z  * [new branch]              gh/fxdawnn/11/orig          -> origin/gh/fxdawnn/11/orig
2025-12-04T09:33:41.5577186Z  * [new branch]              gh/fxdawnn/12/base          -> origin/gh/fxdawnn/12/base
2025-12-04T09:33:41.5578588Z  * [new branch]              gh/fxdawnn/12/head          -> origin/gh/fxdawnn/12/head
2025-12-04T09:33:41.5579896Z  * [new branch]              gh/fxdawnn/12/orig          -> origin/gh/fxdawnn/12/orig
2025-12-04T09:33:41.5581546Z  * [new branch]              gh/fxdawnn/13/base          -> origin/gh/fxdawnn/13/base
2025-12-04T09:33:41.5582885Z  * [new branch]              gh/fxdawnn/13/head          -> origin/gh/fxdawnn/13/head
2025-12-04T09:33:41.5584212Z  * [new branch]              gh/fxdawnn/13/orig          -> origin/gh/fxdawnn/13/orig
2025-12-04T09:33:41.5586120Z  * [new branch]              gh/fxdawnn/14/base          -> origin/gh/fxdawnn/14/base
2025-12-04T09:33:41.5587307Z  * [new branch]              gh/fxdawnn/14/head          -> origin/gh/fxdawnn/14/head
2025-12-04T09:33:41.5589044Z  * [new branch]              gh/fxdawnn/14/orig          -> origin/gh/fxdawnn/14/orig
2025-12-04T09:33:41.5590787Z  * [new branch]              gh/fxdawnn/15/base          -> origin/gh/fxdawnn/15/base
2025-12-04T09:33:41.5592055Z  * [new branch]              gh/fxdawnn/15/head          -> origin/gh/fxdawnn/15/head
2025-12-04T09:33:41.5593341Z  * [new branch]              gh/fxdawnn/15/orig          -> origin/gh/fxdawnn/15/orig
2025-12-04T09:33:41.5595019Z  * [new branch]              gh/fxdawnn/6/base           -> origin/gh/fxdawnn/6/base
2025-12-04T09:33:41.5596329Z  * [new branch]              gh/fxdawnn/6/head           -> origin/gh/fxdawnn/6/head
2025-12-04T09:33:41.5597639Z  * [new branch]              gh/fxdawnn/6/orig           -> origin/gh/fxdawnn/6/orig
2025-12-04T09:33:41.5599835Z  * [new branch]              gh/fxdawnn/7/base           -> origin/gh/fxdawnn/7/base
2025-12-04T09:33:41.5601350Z  * [new branch]              gh/fxdawnn/7/head           -> origin/gh/fxdawnn/7/head
2025-12-04T09:33:41.5605201Z  * [new branch]              gh/fxdawnn/7/orig           -> origin/gh/fxdawnn/7/orig
2025-12-04T09:33:41.5607069Z  * [new branch]              gh/fxdawnn/9/base           -> origin/gh/fxdawnn/9/base
2025-12-04T09:33:41.5608259Z  * [new branch]              gh/fxdawnn/9/head           -> origin/gh/fxdawnn/9/head
2025-12-04T09:33:41.5609869Z  * [new branch]              gh/fxdawnn/9/orig           -> origin/gh/fxdawnn/9/orig
2025-12-04T09:33:41.5611963Z  * [new branch]              gh/galv/1/base              -> origin/gh/galv/1/base
2025-12-04T09:33:41.5613260Z  * [new branch]              gh/galv/1/head              -> origin/gh/galv/1/head
2025-12-04T09:33:41.5614632Z  * [new branch]              gh/galv/1/orig              -> origin/gh/galv/1/orig
2025-12-04T09:33:41.5616322Z  * [new branch]              gh/galv/2/base              -> origin/gh/galv/2/base
2025-12-04T09:33:41.5617619Z  * [new branch]              gh/galv/2/head              -> origin/gh/galv/2/head
2025-12-04T09:33:41.5619008Z  * [new branch]              gh/galv/2/orig              -> origin/gh/galv/2/orig
2025-12-04T09:33:41.5620764Z  * [new branch]              gh/galv/3/base              -> origin/gh/galv/3/base
2025-12-04T09:33:41.5621968Z  * [new branch]              gh/galv/3/head              -> origin/gh/galv/3/head
2025-12-04T09:33:41.5623969Z  * [new branch]              gh/galv/3/orig              -> origin/gh/galv/3/orig
2025-12-04T09:33:41.5625592Z  * [new branch]              gh/guangyey/134/base        -> origin/gh/guangyey/134/base
2025-12-04T09:33:41.5626904Z  * [new branch]              gh/guangyey/134/head        -> origin/gh/guangyey/134/head
2025-12-04T09:33:41.5628204Z  * [new branch]              gh/guangyey/134/orig        -> origin/gh/guangyey/134/orig
2025-12-04T09:33:41.5629869Z  * [new branch]              gh/guangyey/163/base        -> origin/gh/guangyey/163/base
2025-12-04T09:33:41.5631149Z  * [new branch]              gh/guangyey/163/head        -> origin/gh/guangyey/163/head
2025-12-04T09:33:41.5632452Z  * [new branch]              gh/guangyey/163/orig        -> origin/gh/guangyey/163/orig
2025-12-04T09:33:41.5634126Z  * [new branch]              gh/guangyey/168/base        -> origin/gh/guangyey/168/base
2025-12-04T09:33:41.5635414Z  * [new branch]              gh/guangyey/168/head        -> origin/gh/guangyey/168/head
2025-12-04T09:33:41.5636683Z  * [new branch]              gh/guangyey/168/orig        -> origin/gh/guangyey/168/orig
2025-12-04T09:33:41.5638387Z  * [new branch]              gh/guangyey/169/base        -> origin/gh/guangyey/169/base
2025-12-04T09:33:41.5639748Z  * [new branch]              gh/guangyey/169/head        -> origin/gh/guangyey/169/head
2025-12-04T09:33:41.5641069Z  * [new branch]              gh/guangyey/169/orig        -> origin/gh/guangyey/169/orig
2025-12-04T09:33:41.5642867Z  * [new branch]              gh/guangyey/170/base        -> origin/gh/guangyey/170/base
2025-12-04T09:33:41.5644154Z  * [new branch]              gh/guangyey/170/head        -> origin/gh/guangyey/170/head
2025-12-04T09:33:41.5645462Z  * [new branch]              gh/guangyey/170/orig        -> origin/gh/guangyey/170/orig
2025-12-04T09:33:41.5647672Z  * [new branch]              gh/guangyey/171/base        -> origin/gh/guangyey/171/base
2025-12-04T09:33:41.5648947Z  * [new branch]              gh/guangyey/171/head        -> origin/gh/guangyey/171/head
2025-12-04T09:33:41.5650208Z  * [new branch]              gh/guangyey/171/orig        -> origin/gh/guangyey/171/orig
2025-12-04T09:33:41.5651972Z  * [new branch]              gh/guangyey/178/base        -> origin/gh/guangyey/178/base
2025-12-04T09:33:41.5653394Z  * [new branch]              gh/guangyey/178/head        -> origin/gh/guangyey/178/head
2025-12-04T09:33:41.5654606Z  * [new branch]              gh/guangyey/178/orig        -> origin/gh/guangyey/178/orig
2025-12-04T09:33:41.5656304Z  * [new branch]              gh/guangyey/182/base        -> origin/gh/guangyey/182/base
2025-12-04T09:33:41.5657723Z  * [new branch]              gh/guangyey/182/head        -> origin/gh/guangyey/182/head
2025-12-04T09:33:41.5659003Z  * [new branch]              gh/guangyey/182/orig        -> origin/gh/guangyey/182/orig
2025-12-04T09:33:41.5660600Z  * [new branch]              gh/guangyey/183/base        -> origin/gh/guangyey/183/base
2025-12-04T09:33:41.5661862Z  * [new branch]              gh/guangyey/183/head        -> origin/gh/guangyey/183/head
2025-12-04T09:33:41.5663205Z  * [new branch]              gh/guangyey/183/orig        -> origin/gh/guangyey/183/orig
2025-12-04T09:33:41.5664928Z  * [new branch]              gh/guangyey/185/base        -> origin/gh/guangyey/185/base
2025-12-04T09:33:41.5666236Z  * [new branch]              gh/guangyey/185/head        -> origin/gh/guangyey/185/head
2025-12-04T09:33:41.5667504Z  * [new branch]              gh/guangyey/185/orig        -> origin/gh/guangyey/185/orig
2025-12-04T09:33:41.5669220Z  * [new branch]              gh/guangyey/186/base        -> origin/gh/guangyey/186/base
2025-12-04T09:33:41.5670522Z  * [new branch]              gh/guangyey/186/head        -> origin/gh/guangyey/186/head
2025-12-04T09:33:41.5672277Z  * [new branch]              gh/guangyey/186/orig        -> origin/gh/guangyey/186/orig
2025-12-04T09:33:41.5673972Z  * [new branch]              gh/guangyey/187/base        -> origin/gh/guangyey/187/base
2025-12-04T09:33:41.5675328Z  * [new branch]              gh/guangyey/187/head        -> origin/gh/guangyey/187/head
2025-12-04T09:33:41.5676618Z  * [new branch]              gh/guangyey/187/orig        -> origin/gh/guangyey/187/orig
2025-12-04T09:33:41.5678307Z  * [new branch]              gh/guangyey/188/base        -> origin/gh/guangyey/188/base
2025-12-04T09:33:41.5679554Z  * [new branch]              gh/guangyey/188/head        -> origin/gh/guangyey/188/head
2025-12-04T09:33:41.5680845Z  * [new branch]              gh/guangyey/188/orig        -> origin/gh/guangyey/188/orig
2025-12-04T09:33:41.5682590Z  * [new branch]              gh/guangyey/190/base        -> origin/gh/guangyey/190/base
2025-12-04T09:33:41.5683961Z  * [new branch]              gh/guangyey/190/head        -> origin/gh/guangyey/190/head
2025-12-04T09:33:41.5685223Z  * [new branch]              gh/guangyey/190/orig        -> origin/gh/guangyey/190/orig
2025-12-04T09:33:41.5686972Z  * [new branch]              gh/guangyey/208/base        -> origin/gh/guangyey/208/base
2025-12-04T09:33:41.5688454Z  * [new branch]              gh/guangyey/208/head        -> origin/gh/guangyey/208/head
2025-12-04T09:33:41.5689742Z  * [new branch]              gh/guangyey/208/orig        -> origin/gh/guangyey/208/orig
2025-12-04T09:33:41.5691393Z  * [new branch]              gh/guangyey/228/base        -> origin/gh/guangyey/228/base
2025-12-04T09:33:41.5692780Z  * [new branch]              gh/guangyey/228/head        -> origin/gh/guangyey/228/head
2025-12-04T09:33:41.5694086Z  * [new branch]              gh/guangyey/228/orig        -> origin/gh/guangyey/228/orig
2025-12-04T09:33:41.5696191Z  * [new branch]              gh/guangyey/230/base        -> origin/gh/guangyey/230/base
2025-12-04T09:33:41.5697574Z  * [new branch]              gh/guangyey/230/head        -> origin/gh/guangyey/230/head
2025-12-04T09:33:41.5698850Z  * [new branch]              gh/guangyey/230/orig        -> origin/gh/guangyey/230/orig
2025-12-04T09:33:41.5700569Z  * [new branch]              gh/guangyey/231/base        -> origin/gh/guangyey/231/base
2025-12-04T09:33:41.5702193Z  * [new branch]              gh/guangyey/231/head        -> origin/gh/guangyey/231/head
2025-12-04T09:33:41.5703452Z  * [new branch]              gh/guangyey/231/orig        -> origin/gh/guangyey/231/orig
2025-12-04T09:33:41.5705204Z  * [new branch]              gh/guangyey/232/base        -> origin/gh/guangyey/232/base
2025-12-04T09:33:41.5706490Z  * [new branch]              gh/guangyey/232/head        -> origin/gh/guangyey/232/head
2025-12-04T09:33:41.5707808Z  * [new branch]              gh/guangyey/232/orig        -> origin/gh/guangyey/232/orig
2025-12-04T09:33:41.5709549Z  * [new branch]              gh/guangyey/233/base        -> origin/gh/guangyey/233/base
2025-12-04T09:33:41.5710921Z  * [new branch]              gh/guangyey/233/head        -> origin/gh/guangyey/233/head
2025-12-04T09:33:41.5712181Z  * [new branch]              gh/guangyey/233/orig        -> origin/gh/guangyey/233/orig
2025-12-04T09:33:41.5713895Z  * [new branch]              gh/guangyey/234/base        -> origin/gh/guangyey/234/base
2025-12-04T09:33:41.5715185Z  * [new branch]              gh/guangyey/234/head        -> origin/gh/guangyey/234/head
2025-12-04T09:33:41.5716466Z  * [new branch]              gh/guangyey/234/orig        -> origin/gh/guangyey/234/orig
2025-12-04T09:33:41.5718207Z  * [new branch]              gh/guangyey/235/base        -> origin/gh/guangyey/235/base
2025-12-04T09:33:41.5719458Z  * [new branch]              gh/guangyey/235/head        -> origin/gh/guangyey/235/head
2025-12-04T09:33:41.5720718Z  * [new branch]              gh/guangyey/235/orig        -> origin/gh/guangyey/235/orig
2025-12-04T09:33:41.5723319Z  * [new branch]              gh/guangyey/236/base        -> origin/gh/guangyey/236/base
2025-12-04T09:33:41.5724818Z  * [new branch]              gh/guangyey/236/head        -> origin/gh/guangyey/236/head
2025-12-04T09:33:41.5726020Z  * [new branch]              gh/guangyey/236/orig        -> origin/gh/guangyey/236/orig
2025-12-04T09:33:41.5727764Z  * [new branch]              gh/guangyey/237/base        -> origin/gh/guangyey/237/base
2025-12-04T09:33:41.5729156Z  * [new branch]              gh/guangyey/237/head        -> origin/gh/guangyey/237/head
2025-12-04T09:33:41.5730413Z  * [new branch]              gh/guangyey/237/orig        -> origin/gh/guangyey/237/orig
2025-12-04T09:33:41.5732144Z  * [new branch]              gh/guangyey/238/base        -> origin/gh/guangyey/238/base
2025-12-04T09:33:41.5733424Z  * [new branch]              gh/guangyey/238/head        -> origin/gh/guangyey/238/head
2025-12-04T09:33:41.5735171Z  * [new branch]              gh/guangyey/239/base        -> origin/gh/guangyey/239/base
2025-12-04T09:33:41.5736437Z  * [new branch]              gh/guangyey/239/head        -> origin/gh/guangyey/239/head
2025-12-04T09:33:41.5737716Z  * [new branch]              gh/guangyey/239/orig        -> origin/gh/guangyey/239/orig
2025-12-04T09:33:41.5739461Z  * [new branch]              gh/guangyey/240/base        -> origin/gh/guangyey/240/base
2025-12-04T09:33:41.5741270Z  * [new branch]              gh/guangyey/240/head        -> origin/gh/guangyey/240/head
2025-12-04T09:33:41.5742601Z  * [new branch]              gh/guangyey/240/orig        -> origin/gh/guangyey/240/orig
2025-12-04T09:33:41.5744566Z  * [new branch]              gh/guangyey/241/base        -> origin/gh/guangyey/241/base
2025-12-04T09:33:41.5745891Z  * [new branch]              gh/guangyey/241/head        -> origin/gh/guangyey/241/head
2025-12-04T09:33:41.5747240Z  * [new branch]              gh/guangyey/241/orig        -> origin/gh/guangyey/241/orig
2025-12-04T09:33:41.5749134Z  * [new branch]              gh/guangyey/242/base        -> origin/gh/guangyey/242/base
2025-12-04T09:33:41.5750473Z  * [new branch]              gh/guangyey/242/head        -> origin/gh/guangyey/242/head
2025-12-04T09:33:41.5751754Z  * [new branch]              gh/guangyey/242/orig        -> origin/gh/guangyey/242/orig
2025-12-04T09:33:41.5753566Z  * [new branch]              gh/guangyey/243/base        -> origin/gh/guangyey/243/base
2025-12-04T09:33:41.5754854Z  * [new branch]              gh/guangyey/243/head        -> origin/gh/guangyey/243/head
2025-12-04T09:33:41.5756131Z  * [new branch]              gh/guangyey/243/orig        -> origin/gh/guangyey/243/orig
2025-12-04T09:33:41.5757992Z  * [new branch]              gh/guangyey/244/base        -> origin/gh/guangyey/244/base
2025-12-04T09:33:41.5759257Z  * [new branch]              gh/guangyey/244/head        -> origin/gh/guangyey/244/head
2025-12-04T09:33:41.5760537Z  * [new branch]              gh/guangyey/244/orig        -> origin/gh/guangyey/244/orig
2025-12-04T09:33:41.5762356Z  * [new branch]              gh/guangyey/245/base        -> origin/gh/guangyey/245/base
2025-12-04T09:33:41.5763713Z  * [new branch]              gh/guangyey/245/head        -> origin/gh/guangyey/245/head
2025-12-04T09:33:41.5765115Z  * [new branch]              gh/guangyey/245/orig        -> origin/gh/guangyey/245/orig
2025-12-04T09:33:41.5766884Z  * [new branch]              gh/guangyey/246/base        -> origin/gh/guangyey/246/base
2025-12-04T09:33:41.5768174Z  * [new branch]              gh/guangyey/246/head        -> origin/gh/guangyey/246/head
2025-12-04T09:33:41.5769427Z  * [new branch]              gh/guangyey/246/orig        -> origin/gh/guangyey/246/orig
2025-12-04T09:33:41.5771233Z  * [new branch]              gh/guangyey/247/base        -> origin/gh/guangyey/247/base
2025-12-04T09:33:41.5772523Z  * [new branch]              gh/guangyey/247/head        -> origin/gh/guangyey/247/head
2025-12-04T09:33:41.5773791Z  * [new branch]              gh/guangyey/247/orig        -> origin/gh/guangyey/247/orig
2025-12-04T09:33:41.5775609Z  * [new branch]              gh/guangyey/248/base        -> origin/gh/guangyey/248/base
2025-12-04T09:33:41.5776919Z  * [new branch]              gh/guangyey/248/head        -> origin/gh/guangyey/248/head
2025-12-04T09:33:41.5778136Z  * [new branch]              gh/guangyey/248/orig        -> origin/gh/guangyey/248/orig
2025-12-04T09:33:41.5779826Z  * [new branch]              gh/guangyey/249/base        -> origin/gh/guangyey/249/base
2025-12-04T09:33:41.5781177Z  * [new branch]              gh/guangyey/249/head        -> origin/gh/guangyey/249/head
2025-12-04T09:33:41.5782544Z  * [new branch]              gh/guangyey/249/orig        -> origin/gh/guangyey/249/orig
2025-12-04T09:33:41.5784287Z  * [new branch]              gh/guangyey/250/base        -> origin/gh/guangyey/250/base
2025-12-04T09:33:41.5785668Z  * [new branch]              gh/guangyey/250/head        -> origin/gh/guangyey/250/head
2025-12-04T09:33:41.5786977Z  * [new branch]              gh/guangyey/250/orig        -> origin/gh/guangyey/250/orig
2025-12-04T09:33:41.5788657Z  * [new branch]              gh/guangyey/251/base        -> origin/gh/guangyey/251/base
2025-12-04T09:33:41.5789991Z  * [new branch]              gh/guangyey/251/head        -> origin/gh/guangyey/251/head
2025-12-04T09:33:41.5791652Z  * [new branch]              gh/guangyey/251/orig        -> origin/gh/guangyey/251/orig
2025-12-04T09:33:41.5793406Z  * [new branch]              gh/guangyey/252/base        -> origin/gh/guangyey/252/base
2025-12-04T09:33:41.5794655Z  * [new branch]              gh/guangyey/252/head        -> origin/gh/guangyey/252/head
2025-12-04T09:33:41.5795957Z  * [new branch]              gh/guangyey/252/orig        -> origin/gh/guangyey/252/orig
2025-12-04T09:33:41.5797727Z  * [new branch]              gh/guangyey/253/base        -> origin/gh/guangyey/253/base
2025-12-04T09:33:41.5799022Z  * [new branch]              gh/guangyey/253/head        -> origin/gh/guangyey/253/head
2025-12-04T09:33:41.5800413Z  * [new branch]              gh/guangyey/253/orig        -> origin/gh/guangyey/253/orig
2025-12-04T09:33:41.5803087Z  * [new branch]              gh/guangyey/254/base        -> origin/gh/guangyey/254/base
2025-12-04T09:33:41.5804389Z  * [new branch]              gh/guangyey/254/head        -> origin/gh/guangyey/254/head
2025-12-04T09:33:41.5805651Z  * [new branch]              gh/guangyey/254/orig        -> origin/gh/guangyey/254/orig
2025-12-04T09:33:41.5807431Z  * [new branch]              gh/guangyey/255/base        -> origin/gh/guangyey/255/base
2025-12-04T09:33:41.5808686Z  * [new branch]              gh/guangyey/255/head        -> origin/gh/guangyey/255/head
2025-12-04T09:33:41.5809991Z  * [new branch]              gh/guangyey/255/orig        -> origin/gh/guangyey/255/orig
2025-12-04T09:33:41.5812233Z  * [new branch]              gh/guilhermeleobas/107/base -> origin/gh/guilhermeleobas/107/base
2025-12-04T09:33:41.5814026Z  * [new branch]              gh/guilhermeleobas/107/head -> origin/gh/guilhermeleobas/107/head
2025-12-04T09:33:41.5815359Z  * [new branch]              gh/guilhermeleobas/107/orig -> origin/gh/guilhermeleobas/107/orig
2025-12-04T09:33:41.5817366Z  * [new branch]              gh/guilhermeleobas/108/base -> origin/gh/guilhermeleobas/108/base
2025-12-04T09:33:41.5818472Z  * [new branch]              gh/guilhermeleobas/108/head -> origin/gh/guilhermeleobas/108/head
2025-12-04T09:33:41.5819712Z  * [new branch]              gh/guilhermeleobas/108/orig -> origin/gh/guilhermeleobas/108/orig
2025-12-04T09:33:41.5821467Z  * [new branch]              gh/guilhermeleobas/150/base -> origin/gh/guilhermeleobas/150/base
2025-12-04T09:33:41.5824557Z  * [new branch]              gh/guilhermeleobas/150/head -> origin/gh/guilhermeleobas/150/head
2025-12-04T09:33:41.5825688Z  * [new branch]              gh/guilhermeleobas/150/orig -> origin/gh/guilhermeleobas/150/orig
2025-12-04T09:33:41.5827425Z  * [new branch]              gh/guilhermeleobas/168/base -> origin/gh/guilhermeleobas/168/base
2025-12-04T09:33:41.5828692Z  * [new branch]              gh/guilhermeleobas/168/head -> origin/gh/guilhermeleobas/168/head
2025-12-04T09:33:41.5830011Z  * [new branch]              gh/guilhermeleobas/168/orig -> origin/gh/guilhermeleobas/168/orig
2025-12-04T09:33:41.5831778Z  * [new branch]              gh/guilhermeleobas/169/base -> origin/gh/guilhermeleobas/169/base
2025-12-04T09:33:41.5832926Z  * [new branch]              gh/guilhermeleobas/169/head -> origin/gh/guilhermeleobas/169/head
2025-12-04T09:33:41.5835126Z  * [new branch]              gh/guilhermeleobas/169/orig -> origin/gh/guilhermeleobas/169/orig
2025-12-04T09:33:41.5835809Z  * [new branch]              gh/guilhermeleobas/170/base -> origin/gh/guilhermeleobas/170/base
2025-12-04T09:33:41.5837309Z  * [new branch]              gh/guilhermeleobas/170/head -> origin/gh/guilhermeleobas/170/head
2025-12-04T09:33:41.5838684Z  * [new branch]              gh/guilhermeleobas/170/orig -> origin/gh/guilhermeleobas/170/orig
2025-12-04T09:33:41.5841021Z  * [new branch]              gh/guilhermeleobas/171/base -> origin/gh/guilhermeleobas/171/base
2025-12-04T09:33:41.5842249Z  * [new branch]              gh/guilhermeleobas/171/head -> origin/gh/guilhermeleobas/171/head
2025-12-04T09:33:41.5843535Z  * [new branch]              gh/guilhermeleobas/171/orig -> origin/gh/guilhermeleobas/171/orig
2025-12-04T09:33:41.5845318Z  * [new branch]              gh/guilhermeleobas/173/base -> origin/gh/guilhermeleobas/173/base
2025-12-04T09:33:41.5846528Z  * [new branch]              gh/guilhermeleobas/173/head -> origin/gh/guilhermeleobas/173/head
2025-12-04T09:33:41.5847828Z  * [new branch]              gh/guilhermeleobas/173/orig -> origin/gh/guilhermeleobas/173/orig
2025-12-04T09:33:41.5849547Z  * [new branch]              gh/guilhermeleobas/193/base -> origin/gh/guilhermeleobas/193/base
2025-12-04T09:33:41.5850877Z  * [new branch]              gh/guilhermeleobas/193/head -> origin/gh/guilhermeleobas/193/head
2025-12-04T09:33:41.5852283Z  * [new branch]              gh/guilhermeleobas/193/orig -> origin/gh/guilhermeleobas/193/orig
2025-12-04T09:33:41.5853998Z  * [new branch]              gh/guilhermeleobas/204/base -> origin/gh/guilhermeleobas/204/base
2025-12-04T09:33:41.5855566Z  * [new branch]              gh/guilhermeleobas/204/head -> origin/gh/guilhermeleobas/204/head
2025-12-04T09:33:41.5856880Z  * [new branch]              gh/guilhermeleobas/204/orig -> origin/gh/guilhermeleobas/204/orig
2025-12-04T09:33:41.5858591Z  * [new branch]              gh/guilhermeleobas/211/base -> origin/gh/guilhermeleobas/211/base
2025-12-04T09:33:41.5859852Z  * [new branch]              gh/guilhermeleobas/211/head -> origin/gh/guilhermeleobas/211/head
2025-12-04T09:33:41.5861171Z  * [new branch]              gh/guilhermeleobas/211/orig -> origin/gh/guilhermeleobas/211/orig
2025-12-04T09:33:41.5862841Z  * [new branch]              gh/guilhermeleobas/226/base -> origin/gh/guilhermeleobas/226/base
2025-12-04T09:33:41.5864100Z  * [new branch]              gh/guilhermeleobas/226/head -> origin/gh/guilhermeleobas/226/head
2025-12-04T09:33:41.5865355Z  * [new branch]              gh/guilhermeleobas/226/orig -> origin/gh/guilhermeleobas/226/orig
2025-12-04T09:33:41.5867060Z  * [new branch]              gh/guilhermeleobas/236/base -> origin/gh/guilhermeleobas/236/base
2025-12-04T09:33:41.5868321Z  * [new branch]              gh/guilhermeleobas/236/head -> origin/gh/guilhermeleobas/236/head
2025-12-04T09:33:41.5869600Z  * [new branch]              gh/guilhermeleobas/236/orig -> origin/gh/guilhermeleobas/236/orig
2025-12-04T09:33:41.5871327Z  * [new branch]              gh/guilhermeleobas/247/base -> origin/gh/guilhermeleobas/247/base
2025-12-04T09:33:41.5872674Z  * [new branch]              gh/guilhermeleobas/247/head -> origin/gh/guilhermeleobas/247/head
2025-12-04T09:33:41.5874010Z  * [new branch]              gh/guilhermeleobas/247/orig -> origin/gh/guilhermeleobas/247/orig
2025-12-04T09:33:41.5877379Z  * [new branch]              gh/guilhermeleobas/248/base -> origin/gh/guilhermeleobas/248/base
2025-12-04T09:33:41.5877676Z  * [new branch]              gh/guilhermeleobas/248/head -> origin/gh/guilhermeleobas/248/head
2025-12-04T09:33:41.5878520Z  * [new branch]              gh/guilhermeleobas/248/orig -> origin/gh/guilhermeleobas/248/orig
2025-12-04T09:33:41.5879970Z  * [new branch]              gh/guilhermeleobas/250/base -> origin/gh/guilhermeleobas/250/base
2025-12-04T09:33:41.5881131Z  * [new branch]              gh/guilhermeleobas/250/head -> origin/gh/guilhermeleobas/250/head
2025-12-04T09:33:41.5882527Z  * [new branch]              gh/guilhermeleobas/250/orig -> origin/gh/guilhermeleobas/250/orig
2025-12-04T09:33:41.5884971Z  * [new branch]              gh/guilhermeleobas/253/base -> origin/gh/guilhermeleobas/253/base
2025-12-04T09:33:41.5886234Z  * [new branch]              gh/guilhermeleobas/253/head -> origin/gh/guilhermeleobas/253/head
2025-12-04T09:33:41.5887608Z  * [new branch]              gh/guilhermeleobas/253/orig -> origin/gh/guilhermeleobas/253/orig
2025-12-04T09:33:41.5889414Z  * [new branch]              gh/guilhermeleobas/254/base -> origin/gh/guilhermeleobas/254/base
2025-12-04T09:33:41.5890802Z  * [new branch]              gh/guilhermeleobas/254/head -> origin/gh/guilhermeleobas/254/head
2025-12-04T09:33:41.5892079Z  * [new branch]              gh/guilhermeleobas/254/orig -> origin/gh/guilhermeleobas/254/orig
2025-12-04T09:33:41.5893823Z  * [new branch]              gh/guilhermeleobas/255/base -> origin/gh/guilhermeleobas/255/base
2025-12-04T09:33:41.5895118Z  * [new branch]              gh/guilhermeleobas/255/head -> origin/gh/guilhermeleobas/255/head
2025-12-04T09:33:41.5896412Z  * [new branch]              gh/guilhermeleobas/255/orig -> origin/gh/guilhermeleobas/255/orig
2025-12-04T09:33:41.5898206Z  * [new branch]              gh/guilhermeleobas/256/base -> origin/gh/guilhermeleobas/256/base
2025-12-04T09:33:41.5899524Z  * [new branch]              gh/guilhermeleobas/256/head -> origin/gh/guilhermeleobas/256/head
2025-12-04T09:33:41.5901126Z  * [new branch]              gh/guilhermeleobas/256/orig -> origin/gh/guilhermeleobas/256/orig
2025-12-04T09:33:41.5902926Z  * [new branch]              gh/guilhermeleobas/257/base -> origin/gh/guilhermeleobas/257/base
2025-12-04T09:33:41.5904172Z  * [new branch]              gh/guilhermeleobas/257/head -> origin/gh/guilhermeleobas/257/head
2025-12-04T09:33:41.5905655Z  * [new branch]              gh/guilhermeleobas/257/orig -> origin/gh/guilhermeleobas/257/orig
2025-12-04T09:33:41.5907601Z  * [new branch]              gh/guilhermeleobas/258/base -> origin/gh/guilhermeleobas/258/base
2025-12-04T09:33:41.5908714Z  * [new branch]              gh/guilhermeleobas/258/head -> origin/gh/guilhermeleobas/258/head
2025-12-04T09:33:41.5910113Z  * [new branch]              gh/guilhermeleobas/258/orig -> origin/gh/guilhermeleobas/258/orig
2025-12-04T09:33:41.5911792Z  * [new branch]              gh/guilhermeleobas/259/base -> origin/gh/guilhermeleobas/259/base
2025-12-04T09:33:41.5913126Z  * [new branch]              gh/guilhermeleobas/259/head -> origin/gh/guilhermeleobas/259/head
2025-12-04T09:33:41.5914399Z  * [new branch]              gh/guilhermeleobas/259/orig -> origin/gh/guilhermeleobas/259/orig
2025-12-04T09:33:41.5916161Z  * [new branch]              gh/guilhermeleobas/260/base -> origin/gh/guilhermeleobas/260/base
2025-12-04T09:33:41.5917430Z  * [new branch]              gh/guilhermeleobas/260/head -> origin/gh/guilhermeleobas/260/head
2025-12-04T09:33:41.5918710Z  * [new branch]              gh/guilhermeleobas/260/orig -> origin/gh/guilhermeleobas/260/orig
2025-12-04T09:33:41.5920496Z  * [new branch]              gh/guilhermeleobas/261/base -> origin/gh/guilhermeleobas/261/base
2025-12-04T09:33:41.5921707Z  * [new branch]              gh/guilhermeleobas/261/head -> origin/gh/guilhermeleobas/261/head
2025-12-04T09:33:41.5923133Z  * [new branch]              gh/guilhermeleobas/261/orig -> origin/gh/guilhermeleobas/261/orig
2025-12-04T09:33:41.5924899Z  * [new branch]              gh/guilhermeleobas/262/base -> origin/gh/guilhermeleobas/262/base
2025-12-04T09:33:41.5926397Z  * [new branch]              gh/guilhermeleobas/262/head -> origin/gh/guilhermeleobas/262/head
2025-12-04T09:33:41.5927631Z  * [new branch]              gh/guilhermeleobas/262/orig -> origin/gh/guilhermeleobas/262/orig
2025-12-04T09:33:41.5929503Z  * [new branch]              gh/guilhermeleobas/263/base -> origin/gh/guilhermeleobas/263/base
2025-12-04T09:33:41.5930754Z  * [new branch]              gh/guilhermeleobas/263/head -> origin/gh/guilhermeleobas/263/head
2025-12-04T09:33:41.5932011Z  * [new branch]              gh/guilhermeleobas/263/orig -> origin/gh/guilhermeleobas/263/orig
2025-12-04T09:33:41.5933826Z  * [new branch]              gh/guilhermeleobas/264/base -> origin/gh/guilhermeleobas/264/base
2025-12-04T09:33:41.5935096Z  * [new branch]              gh/guilhermeleobas/264/head -> origin/gh/guilhermeleobas/264/head
2025-12-04T09:33:41.5936373Z  * [new branch]              gh/guilhermeleobas/264/orig -> origin/gh/guilhermeleobas/264/orig
2025-12-04T09:33:41.5938132Z  * [new branch]              gh/guilhermeleobas/265/base -> origin/gh/guilhermeleobas/265/base
2025-12-04T09:33:41.5939424Z  * [new branch]              gh/guilhermeleobas/265/head -> origin/gh/guilhermeleobas/265/head
2025-12-04T09:33:41.5940795Z  * [new branch]              gh/guilhermeleobas/265/orig -> origin/gh/guilhermeleobas/265/orig
2025-12-04T09:33:41.5942524Z  * [new branch]              gh/guilhermeleobas/266/base -> origin/gh/guilhermeleobas/266/base
2025-12-04T09:33:41.5943922Z  * [new branch]              gh/guilhermeleobas/266/head -> origin/gh/guilhermeleobas/266/head
2025-12-04T09:33:41.5945206Z  * [new branch]              gh/guilhermeleobas/266/orig -> origin/gh/guilhermeleobas/266/orig
2025-12-04T09:33:41.5946992Z  * [new branch]              gh/guilhermeleobas/267/base -> origin/gh/guilhermeleobas/267/base
2025-12-04T09:33:41.5948257Z  * [new branch]              gh/guilhermeleobas/267/head -> origin/gh/guilhermeleobas/267/head
2025-12-04T09:33:41.5949553Z  * [new branch]              gh/guilhermeleobas/267/orig -> origin/gh/guilhermeleobas/267/orig
2025-12-04T09:33:41.5952174Z  * [new branch]              gh/hameerabbasi/1/base      -> origin/gh/hameerabbasi/1/base
2025-12-04T09:33:41.5953977Z  * [new branch]              gh/hameerabbasi/1/head      -> origin/gh/hameerabbasi/1/head
2025-12-04T09:33:41.5955202Z  * [new branch]              gh/hameerabbasi/2/base      -> origin/gh/hameerabbasi/2/base
2025-12-04T09:33:41.5956463Z  * [new branch]              gh/hameerabbasi/2/head      -> origin/gh/hameerabbasi/2/head
2025-12-04T09:33:41.5957800Z  * [new branch]              gh/hameerabbasi/2/orig      -> origin/gh/hameerabbasi/2/orig
2025-12-04T09:33:41.5959432Z  * [new branch]              gh/hameerabbasi/3/base      -> origin/gh/hameerabbasi/3/base
2025-12-04T09:33:41.5960840Z  * [new branch]              gh/hameerabbasi/3/head      -> origin/gh/hameerabbasi/3/head
2025-12-04T09:33:41.5962331Z  * [new branch]              gh/hameerabbasi/3/orig      -> origin/gh/hameerabbasi/3/orig
2025-12-04T09:33:41.5964045Z  * [new branch]              gh/hameerabbasi/4/base      -> origin/gh/hameerabbasi/4/base
2025-12-04T09:33:41.5965347Z  * [new branch]              gh/hameerabbasi/4/head      -> origin/gh/hameerabbasi/4/head
2025-12-04T09:33:41.5966487Z  * [new branch]              gh/hameerabbasi/4/orig      -> origin/gh/hameerabbasi/4/orig
2025-12-04T09:33:41.5968458Z  * [new branch]              gh/huydhn/1/next            -> origin/gh/huydhn/1/next
2025-12-04T09:33:41.5970012Z  * [new branch]              gh/huydhn/2/next            -> origin/gh/huydhn/2/next
2025-12-04T09:33:41.5971671Z  * [new branch]              gh/huydhn/3/next            -> origin/gh/huydhn/3/next
2025-12-04T09:33:41.5973391Z  * [new branch]              gh/huydhn/4/next            -> origin/gh/huydhn/4/next
2025-12-04T09:33:41.5975119Z  * [new branch]              gh/huydhn/5/next            -> origin/gh/huydhn/5/next
2025-12-04T09:33:41.5976764Z  * [new branch]              gh/huydhn/6/next            -> origin/gh/huydhn/6/next
2025-12-04T09:33:41.5978839Z  * [new branch]              gh/int3/97/base             -> origin/gh/int3/97/base
2025-12-04T09:33:41.5980121Z  * [new branch]              gh/int3/97/head             -> origin/gh/int3/97/head
2025-12-04T09:33:41.5982250Z  * [new branch]              gh/isuruf/101/base          -> origin/gh/isuruf/101/base
2025-12-04T09:33:41.5983460Z  * [new branch]              gh/isuruf/101/head          -> origin/gh/isuruf/101/head
2025-12-04T09:33:41.5985763Z  * [new branch]              gh/isuruf/146/base          -> origin/gh/isuruf/146/base
2025-12-04T09:33:41.5987035Z  * [new branch]              gh/isuruf/146/head          -> origin/gh/isuruf/146/head
2025-12-04T09:33:41.5988340Z  * [new branch]              gh/isuruf/146/orig          -> origin/gh/isuruf/146/orig
2025-12-04T09:33:41.5990588Z  * [new branch]              gh/isuruf/158/base          -> origin/gh/isuruf/158/base
2025-12-04T09:33:41.5991843Z  * [new branch]              gh/isuruf/158/head          -> origin/gh/isuruf/158/head
2025-12-04T09:33:41.5993434Z  * [new branch]              gh/isuruf/159/base          -> origin/gh/isuruf/159/base
2025-12-04T09:33:41.5994684Z  * [new branch]              gh/isuruf/159/head          -> origin/gh/isuruf/159/head
2025-12-04T09:33:41.5996403Z  * [new branch]              gh/isuruf/160/base          -> origin/gh/isuruf/160/base
2025-12-04T09:33:41.5997652Z  * [new branch]              gh/isuruf/160/head          -> origin/gh/isuruf/160/head
2025-12-04T09:33:41.5998982Z  * [new branch]              gh/isuruf/160/orig          -> origin/gh/isuruf/160/orig
2025-12-04T09:33:41.6000650Z  * [new branch]              gh/isuruf/81/base           -> origin/gh/isuruf/81/base
2025-12-04T09:33:41.6005104Z  * [new branch]              gh/isuruf/81/head           -> origin/gh/isuruf/81/head
2025-12-04T09:33:41.6006410Z  * [new branch]              gh/isuruf/81/orig           -> origin/gh/isuruf/81/orig
2025-12-04T09:33:41.6008450Z  * [new branch]              gh/jamesjwu/176/base        -> origin/gh/jamesjwu/176/base
2025-12-04T09:33:41.6009902Z  * [new branch]              gh/jamesjwu/176/head        -> origin/gh/jamesjwu/176/head
2025-12-04T09:33:41.6011163Z  * [new branch]              gh/jamesjwu/176/orig        -> origin/gh/jamesjwu/176/orig
2025-12-04T09:33:41.6012841Z  * [new branch]              gh/jamesjwu/187/base        -> origin/gh/jamesjwu/187/base
2025-12-04T09:33:41.6014077Z  * [new branch]              gh/jamesjwu/187/head        -> origin/gh/jamesjwu/187/head
2025-12-04T09:33:41.6015365Z  * [new branch]              gh/jamesjwu/187/orig        -> origin/gh/jamesjwu/187/orig
2025-12-04T09:33:41.6017070Z  * [new branch]              gh/jamesjwu/196/base        -> origin/gh/jamesjwu/196/base
2025-12-04T09:33:41.6018357Z  * [new branch]              gh/jamesjwu/196/head        -> origin/gh/jamesjwu/196/head
2025-12-04T09:33:41.6019653Z  * [new branch]              gh/jamesjwu/196/orig        -> origin/gh/jamesjwu/196/orig
2025-12-04T09:33:41.6021326Z  * [new branch]              gh/jamesjwu/198/base        -> origin/gh/jamesjwu/198/base
2025-12-04T09:33:41.6022613Z  * [new branch]              gh/jamesjwu/198/head        -> origin/gh/jamesjwu/198/head
2025-12-04T09:33:41.6023853Z  * [new branch]              gh/jamesjwu/198/orig        -> origin/gh/jamesjwu/198/orig
2025-12-04T09:33:41.6025589Z  * [new branch]              gh/jamesjwu/207/base        -> origin/gh/jamesjwu/207/base
2025-12-04T09:33:41.6027174Z  * [new branch]              gh/jamesjwu/207/head        -> origin/gh/jamesjwu/207/head
2025-12-04T09:33:41.6028537Z  * [new branch]              gh/jamesjwu/207/orig        -> origin/gh/jamesjwu/207/orig
2025-12-04T09:33:41.6030361Z  * [new branch]              gh/jamesjwu/208/base        -> origin/gh/jamesjwu/208/base
2025-12-04T09:33:41.6031679Z  * [new branch]              gh/jamesjwu/208/head        -> origin/gh/jamesjwu/208/head
2025-12-04T09:33:41.6032942Z  * [new branch]              gh/jamesjwu/208/orig        -> origin/gh/jamesjwu/208/orig
2025-12-04T09:33:41.6034703Z  * [new branch]              gh/jamesjwu/52/base         -> origin/gh/jamesjwu/52/base
2025-12-04T09:33:41.6035974Z  * [new branch]              gh/jamesjwu/52/head         -> origin/gh/jamesjwu/52/head
2025-12-04T09:33:41.6037660Z  * [new branch]              gh/jamesjwu/53/base         -> origin/gh/jamesjwu/53/base
2025-12-04T09:33:41.6038779Z  * [new branch]              gh/jamesjwu/53/head         -> origin/gh/jamesjwu/53/head
2025-12-04T09:33:41.6040275Z  * [new branch]              gh/jamesjwu/54/base         -> origin/gh/jamesjwu/54/base
2025-12-04T09:33:41.6041504Z  * [new branch]              gh/jamesjwu/54/head         -> origin/gh/jamesjwu/54/head
2025-12-04T09:33:41.6043250Z  * [new branch]              gh/jamesjwu/55/base         -> origin/gh/jamesjwu/55/base
2025-12-04T09:33:41.6044462Z  * [new branch]              gh/jamesjwu/55/head         -> origin/gh/jamesjwu/55/head
2025-12-04T09:33:41.6045953Z  * [new branch]              gh/jamesjwu/56/base         -> origin/gh/jamesjwu/56/base
2025-12-04T09:33:41.6047181Z  * [new branch]              gh/jamesjwu/56/head         -> origin/gh/jamesjwu/56/head
2025-12-04T09:33:41.6048704Z  * [new branch]              gh/jamesjwu/57/base         -> origin/gh/jamesjwu/57/base
2025-12-04T09:33:41.6049932Z  * [new branch]              gh/jamesjwu/57/head         -> origin/gh/jamesjwu/57/head
2025-12-04T09:33:41.6051462Z  * [new branch]              gh/jamesjwu/58/base         -> origin/gh/jamesjwu/58/base
2025-12-04T09:33:41.6052681Z  * [new branch]              gh/jamesjwu/58/head         -> origin/gh/jamesjwu/58/head
2025-12-04T09:33:41.6054245Z  * [new branch]              gh/jamesjwu/59/base         -> origin/gh/jamesjwu/59/base
2025-12-04T09:33:41.6055510Z  * [new branch]              gh/jamesjwu/59/head         -> origin/gh/jamesjwu/59/head
2025-12-04T09:33:41.6057036Z  * [new branch]              gh/jamesjwu/60/base         -> origin/gh/jamesjwu/60/base
2025-12-04T09:33:41.6058442Z  * [new branch]              gh/jamesjwu/60/head         -> origin/gh/jamesjwu/60/head
2025-12-04T09:33:41.6059845Z  * [new branch]              gh/jamesjwu/61/base         -> origin/gh/jamesjwu/61/base
2025-12-04T09:33:41.6061058Z  * [new branch]              gh/jamesjwu/61/head         -> origin/gh/jamesjwu/61/head
2025-12-04T09:33:41.6062594Z  * [new branch]              gh/jamesjwu/62/base         -> origin/gh/jamesjwu/62/base
2025-12-04T09:33:41.6063759Z  * [new branch]              gh/jamesjwu/62/head         -> origin/gh/jamesjwu/62/head
2025-12-04T09:33:41.6065287Z  * [new branch]              gh/jamesjwu/63/base         -> origin/gh/jamesjwu/63/base
2025-12-04T09:33:41.6066557Z  * [new branch]              gh/jamesjwu/63/head         -> origin/gh/jamesjwu/63/head
2025-12-04T09:33:41.6068820Z  * [new branch]              gh/jamesjwu/64/base         -> origin/gh/jamesjwu/64/base
2025-12-04T09:33:41.6070097Z  * [new branch]              gh/jamesjwu/64/head         -> origin/gh/jamesjwu/64/head
2025-12-04T09:33:41.6072026Z  * [new branch]              gh/jamesjwu/65/base         -> origin/gh/jamesjwu/65/base
2025-12-04T09:33:41.6073266Z  * [new branch]              gh/jamesjwu/65/head         -> origin/gh/jamesjwu/65/head
2025-12-04T09:33:41.6075531Z  * [new branch]              gh/janeyx99/165/base        -> origin/gh/janeyx99/165/base
2025-12-04T09:33:41.6076880Z  * [new branch]              gh/janeyx99/165/head        -> origin/gh/janeyx99/165/head
2025-12-04T09:33:41.6078145Z  * [new branch]              gh/janeyx99/165/orig        -> origin/gh/janeyx99/165/orig
2025-12-04T09:33:41.6079724Z  * [new branch]              gh/janeyx99/201/base        -> origin/gh/janeyx99/201/base
2025-12-04T09:33:41.6080970Z  * [new branch]              gh/janeyx99/201/head        -> origin/gh/janeyx99/201/head
2025-12-04T09:33:41.6082290Z  * [new branch]              gh/janeyx99/201/orig        -> origin/gh/janeyx99/201/orig
2025-12-04T09:33:41.6084385Z  * [new branch]              gh/janeyx99/225/base        -> origin/gh/janeyx99/225/base
2025-12-04T09:33:41.6085685Z  * [new branch]              gh/janeyx99/225/head        -> origin/gh/janeyx99/225/head
2025-12-04T09:33:41.6086961Z  * [new branch]              gh/janeyx99/225/orig        -> origin/gh/janeyx99/225/orig
2025-12-04T09:33:41.6088664Z  * [new branch]              gh/janeyx99/299/base        -> origin/gh/janeyx99/299/base
2025-12-04T09:33:41.6090022Z  * [new branch]              gh/janeyx99/299/head        -> origin/gh/janeyx99/299/head
2025-12-04T09:33:41.6091364Z  * [new branch]              gh/janeyx99/299/orig        -> origin/gh/janeyx99/299/orig
2025-12-04T09:33:41.6093692Z  * [new branch]              gh/janeyx99/302/base        -> origin/gh/janeyx99/302/base
2025-12-04T09:33:41.6094960Z  * [new branch]              gh/janeyx99/302/head        -> origin/gh/janeyx99/302/head
2025-12-04T09:33:41.6096538Z  * [new branch]              gh/janeyx99/303/base        -> origin/gh/janeyx99/303/base
2025-12-04T09:33:41.6097840Z  * [new branch]              gh/janeyx99/303/head        -> origin/gh/janeyx99/303/head
2025-12-04T09:33:41.6099494Z  * [new branch]              gh/janeyx99/305/base        -> origin/gh/janeyx99/305/base
2025-12-04T09:33:41.6100944Z  * [new branch]              gh/janeyx99/305/head        -> origin/gh/janeyx99/305/head
2025-12-04T09:33:41.6102651Z  * [new branch]              gh/janeyx99/306/base        -> origin/gh/janeyx99/306/base
2025-12-04T09:33:41.6103856Z  * [new branch]              gh/janeyx99/306/head        -> origin/gh/janeyx99/306/head
2025-12-04T09:33:41.6105547Z  * [new branch]              gh/janeyx99/314/base        -> origin/gh/janeyx99/314/base
2025-12-04T09:33:41.6106914Z  * [new branch]              gh/janeyx99/314/head        -> origin/gh/janeyx99/314/head
2025-12-04T09:33:41.6108316Z  * [new branch]              gh/janeyx99/314/orig        -> origin/gh/janeyx99/314/orig
2025-12-04T09:33:41.6109998Z  * [new branch]              gh/janeyx99/315/base        -> origin/gh/janeyx99/315/base
2025-12-04T09:33:41.6111316Z  * [new branch]              gh/janeyx99/315/head        -> origin/gh/janeyx99/315/head
2025-12-04T09:33:41.6112591Z  * [new branch]              gh/janeyx99/315/orig        -> origin/gh/janeyx99/315/orig
2025-12-04T09:33:41.6114316Z  * [new branch]              gh/janeyx99/316/base        -> origin/gh/janeyx99/316/base
2025-12-04T09:33:41.6115591Z  * [new branch]              gh/janeyx99/316/head        -> origin/gh/janeyx99/316/head
2025-12-04T09:33:41.6116838Z  * [new branch]              gh/janeyx99/316/orig        -> origin/gh/janeyx99/316/orig
2025-12-04T09:33:41.6118758Z  * [new branch]              gh/janeyx99/317/base        -> origin/gh/janeyx99/317/base
2025-12-04T09:33:41.6120061Z  * [new branch]              gh/janeyx99/317/head        -> origin/gh/janeyx99/317/head
2025-12-04T09:33:41.6121311Z  * [new branch]              gh/janeyx99/317/orig        -> origin/gh/janeyx99/317/orig
2025-12-04T09:33:41.6123183Z  * [new branch]              gh/janeyx99/325/base        -> origin/gh/janeyx99/325/base
2025-12-04T09:33:41.6124448Z  * [new branch]              gh/janeyx99/325/head        -> origin/gh/janeyx99/325/head
2025-12-04T09:33:41.6125823Z  * [new branch]              gh/janeyx99/325/orig        -> origin/gh/janeyx99/325/orig
2025-12-04T09:33:41.6127519Z  * [new branch]              gh/janeyx99/327/base        -> origin/gh/janeyx99/327/base
2025-12-04T09:33:41.6129287Z  * [new branch]              gh/janeyx99/327/head        -> origin/gh/janeyx99/327/head
2025-12-04T09:33:41.6130985Z  * [new branch]              gh/janeyx99/327/orig        -> origin/gh/janeyx99/327/orig
2025-12-04T09:33:41.6132745Z  * [new branch]              gh/janeyx99/328/base        -> origin/gh/janeyx99/328/base
2025-12-04T09:33:41.6134069Z  * [new branch]              gh/janeyx99/328/head        -> origin/gh/janeyx99/328/head
2025-12-04T09:33:41.6135382Z  * [new branch]              gh/janeyx99/328/orig        -> origin/gh/janeyx99/328/orig
2025-12-04T09:33:41.6136948Z  * [new branch]              gh/janeyx99/329/base        -> origin/gh/janeyx99/329/base
2025-12-04T09:33:41.6138265Z  * [new branch]              gh/janeyx99/329/head        -> origin/gh/janeyx99/329/head
2025-12-04T09:33:41.6139550Z  * [new branch]              gh/janeyx99/329/orig        -> origin/gh/janeyx99/329/orig
2025-12-04T09:33:41.6142248Z  * [new branch]              gh/janeyx99/330/base        -> origin/gh/janeyx99/330/base
2025-12-04T09:33:41.6143714Z  * [new branch]              gh/janeyx99/330/head        -> origin/gh/janeyx99/330/head
2025-12-04T09:33:41.6144989Z  * [new branch]              gh/janeyx99/330/orig        -> origin/gh/janeyx99/330/orig
2025-12-04T09:33:41.6147257Z  * [new branch]              gh/janeyx99/331/base        -> origin/gh/janeyx99/331/base
2025-12-04T09:33:41.6148818Z  * [new branch]              gh/janeyx99/331/head        -> origin/gh/janeyx99/331/head
2025-12-04T09:33:41.6149881Z  * [new branch]              gh/janeyx99/331/orig        -> origin/gh/janeyx99/331/orig
2025-12-04T09:33:41.6151603Z  * [new branch]              gh/janeyx99/332/base        -> origin/gh/janeyx99/332/base
2025-12-04T09:33:41.6152857Z  * [new branch]              gh/janeyx99/332/head        -> origin/gh/janeyx99/332/head
2025-12-04T09:33:41.6154118Z  * [new branch]              gh/janeyx99/332/orig        -> origin/gh/janeyx99/332/orig
2025-12-04T09:33:41.6155705Z  * [new branch]              gh/janeyx99/333/base        -> origin/gh/janeyx99/333/base
2025-12-04T09:33:41.6156995Z  * [new branch]              gh/janeyx99/333/head        -> origin/gh/janeyx99/333/head
2025-12-04T09:33:41.6158228Z  * [new branch]              gh/janeyx99/333/orig        -> origin/gh/janeyx99/333/orig
2025-12-04T09:33:41.6160111Z  * [new branch]              gh/janeyx99/88/base         -> origin/gh/janeyx99/88/base
2025-12-04T09:33:41.6161590Z  * [new branch]              gh/janeyx99/88/head         -> origin/gh/janeyx99/88/head
2025-12-04T09:33:41.6162784Z  * [new branch]              gh/janeyx99/88/orig         -> origin/gh/janeyx99/88/orig
2025-12-04T09:33:41.6164976Z  * [new branch]              gh/jansel/360/base          -> origin/gh/jansel/360/base
2025-12-04T09:33:41.6166189Z  * [new branch]              gh/jansel/360/head          -> origin/gh/jansel/360/head
2025-12-04T09:33:41.6167838Z  * [new branch]              gh/jansel/451/base          -> origin/gh/jansel/451/base
2025-12-04T09:33:41.6169227Z  * [new branch]              gh/jansel/451/head          -> origin/gh/jansel/451/head
2025-12-04T09:33:41.6170494Z  * [new branch]              gh/jansel/451/orig          -> origin/gh/jansel/451/orig
2025-12-04T09:33:41.6172140Z  * [new branch]              gh/jansel/462/base          -> origin/gh/jansel/462/base
2025-12-04T09:33:41.6173365Z  * [new branch]              gh/jansel/462/head          -> origin/gh/jansel/462/head
2025-12-04T09:33:41.6174639Z  * [new branch]              gh/jansel/462/orig          -> origin/gh/jansel/462/orig
2025-12-04T09:33:41.6176305Z  * [new branch]              gh/jansel/533/base          -> origin/gh/jansel/533/base
2025-12-04T09:33:41.6177504Z  * [new branch]              gh/jansel/533/head          -> origin/gh/jansel/533/head
2025-12-04T09:33:41.6178866Z  * [new branch]              gh/jansel/533/orig          -> origin/gh/jansel/533/orig
2025-12-04T09:33:41.6180530Z  * [new branch]              gh/jansel/552/base          -> origin/gh/jansel/552/base
2025-12-04T09:33:41.6181786Z  * [new branch]              gh/jansel/552/head          -> origin/gh/jansel/552/head
2025-12-04T09:33:41.6183015Z  * [new branch]              gh/jansel/552/orig          -> origin/gh/jansel/552/orig
2025-12-04T09:33:41.6184718Z  * [new branch]              gh/jansel/553/base          -> origin/gh/jansel/553/base
2025-12-04T09:33:41.6185960Z  * [new branch]              gh/jansel/553/head          -> origin/gh/jansel/553/head
2025-12-04T09:33:41.6187217Z  * [new branch]              gh/jansel/553/orig          -> origin/gh/jansel/553/orig
2025-12-04T09:33:41.6188908Z  * [new branch]              gh/jansel/554/base          -> origin/gh/jansel/554/base
2025-12-04T09:33:41.6190159Z  * [new branch]              gh/jansel/554/head          -> origin/gh/jansel/554/head
2025-12-04T09:33:41.6191421Z  * [new branch]              gh/jansel/554/orig          -> origin/gh/jansel/554/orig
2025-12-04T09:33:41.6193074Z  * [new branch]              gh/jansel/555/base          -> origin/gh/jansel/555/base
2025-12-04T09:33:41.6194535Z  * [new branch]              gh/jansel/555/head          -> origin/gh/jansel/555/head
2025-12-04T09:33:41.6195862Z  * [new branch]              gh/jansel/555/orig          -> origin/gh/jansel/555/orig
2025-12-04T09:33:41.6197617Z  * [new branch]              gh/jansel/556/base          -> origin/gh/jansel/556/base
2025-12-04T09:33:41.6198892Z  * [new branch]              gh/jansel/556/head          -> origin/gh/jansel/556/head
2025-12-04T09:33:41.6200150Z  * [new branch]              gh/jansel/556/orig          -> origin/gh/jansel/556/orig
2025-12-04T09:33:41.6202072Z  * [new branch]              gh/jansel/557/base          -> origin/gh/jansel/557/base
2025-12-04T09:33:41.6203621Z  * [new branch]              gh/jansel/557/head          -> origin/gh/jansel/557/head
2025-12-04T09:33:41.6204705Z  * [new branch]              gh/jansel/557/orig          -> origin/gh/jansel/557/orig
2025-12-04T09:33:41.6206397Z  * [new branch]              gh/jansel/558/base          -> origin/gh/jansel/558/base
2025-12-04T09:33:41.6207688Z  * [new branch]              gh/jansel/558/head          -> origin/gh/jansel/558/head
2025-12-04T09:33:41.6208962Z  * [new branch]              gh/jansel/558/orig          -> origin/gh/jansel/558/orig
2025-12-04T09:33:41.6210609Z  * [new branch]              gh/jansel/559/base          -> origin/gh/jansel/559/base
2025-12-04T09:33:41.6211897Z  * [new branch]              gh/jansel/559/head          -> origin/gh/jansel/559/head
2025-12-04T09:33:41.6213287Z  * [new branch]              gh/jansel/559/orig          -> origin/gh/jansel/559/orig
2025-12-04T09:33:41.6214992Z  * [new branch]              gh/jansel/560/base          -> origin/gh/jansel/560/base
2025-12-04T09:33:41.6216230Z  * [new branch]              gh/jansel/560/head          -> origin/gh/jansel/560/head
2025-12-04T09:33:41.6217472Z  * [new branch]              gh/jansel/560/orig          -> origin/gh/jansel/560/orig
2025-12-04T09:33:41.6219181Z  * [new branch]              gh/jansel/561/base          -> origin/gh/jansel/561/base
2025-12-04T09:33:41.6220437Z  * [new branch]              gh/jansel/561/head          -> origin/gh/jansel/561/head
2025-12-04T09:33:41.6221666Z  * [new branch]              gh/jansel/561/orig          -> origin/gh/jansel/561/orig
2025-12-04T09:33:41.6223345Z  * [new branch]              gh/jansel/562/base          -> origin/gh/jansel/562/base
2025-12-04T09:33:41.6224593Z  * [new branch]              gh/jansel/562/head          -> origin/gh/jansel/562/head
2025-12-04T09:33:41.6225867Z  * [new branch]              gh/jansel/562/orig          -> origin/gh/jansel/562/orig
2025-12-04T09:33:41.6227505Z  * [new branch]              gh/jansel/563/base          -> origin/gh/jansel/563/base
2025-12-04T09:33:41.6228762Z  * [new branch]              gh/jansel/563/head          -> origin/gh/jansel/563/head
2025-12-04T09:33:41.6230375Z  * [new branch]              gh/jansel/563/orig          -> origin/gh/jansel/563/orig
2025-12-04T09:33:41.6232281Z  * [new branch]              gh/jansel/564/base          -> origin/gh/jansel/564/base
2025-12-04T09:33:41.6233541Z  * [new branch]              gh/jansel/564/head          -> origin/gh/jansel/564/head
2025-12-04T09:33:41.6234826Z  * [new branch]              gh/jansel/564/orig          -> origin/gh/jansel/564/orig
2025-12-04T09:33:41.6236591Z  * [new branch]              gh/jansel/565/base          -> origin/gh/jansel/565/base
2025-12-04T09:33:41.6237843Z  * [new branch]              gh/jansel/565/head          -> origin/gh/jansel/565/head
2025-12-04T09:33:41.6239129Z  * [new branch]              gh/jansel/565/orig          -> origin/gh/jansel/565/orig
2025-12-04T09:33:41.6240883Z  * [new branch]              gh/jansel/566/base          -> origin/gh/jansel/566/base
2025-12-04T09:33:41.6242242Z  * [new branch]              gh/jansel/566/head          -> origin/gh/jansel/566/head
2025-12-04T09:33:41.6243553Z  * [new branch]              gh/jansel/566/orig          -> origin/gh/jansel/566/orig
2025-12-04T09:33:41.6245278Z  * [new branch]              gh/jansel/567/base          -> origin/gh/jansel/567/base
2025-12-04T09:33:41.6246662Z  * [new branch]              gh/jansel/567/head          -> origin/gh/jansel/567/head
2025-12-04T09:33:41.6247962Z  * [new branch]              gh/jansel/567/orig          -> origin/gh/jansel/567/orig
2025-12-04T09:33:41.6249802Z  * [new branch]              gh/jansel/568/base          -> origin/gh/jansel/568/base
2025-12-04T09:33:41.6251083Z  * [new branch]              gh/jansel/568/head          -> origin/gh/jansel/568/head
2025-12-04T09:33:41.6252354Z  * [new branch]              gh/jansel/568/orig          -> origin/gh/jansel/568/orig
2025-12-04T09:33:41.6254063Z  * [new branch]              gh/jansel/569/base          -> origin/gh/jansel/569/base
2025-12-04T09:33:41.6255297Z  * [new branch]              gh/jansel/569/head          -> origin/gh/jansel/569/head
2025-12-04T09:33:41.6256565Z  * [new branch]              gh/jansel/569/orig          -> origin/gh/jansel/569/orig
2025-12-04T09:33:41.6258778Z  * [new branch]              gh/jansel/570/base          -> origin/gh/jansel/570/base
2025-12-04T09:33:41.6260087Z  * [new branch]              gh/jansel/570/head          -> origin/gh/jansel/570/head
2025-12-04T09:33:41.6261314Z  * [new branch]              gh/jansel/570/orig          -> origin/gh/jansel/570/orig
2025-12-04T09:33:41.6263021Z  * [new branch]              gh/jansel/571/base          -> origin/gh/jansel/571/base
2025-12-04T09:33:41.6264322Z  * [new branch]              gh/jansel/571/head          -> origin/gh/jansel/571/head
2025-12-04T09:33:41.6265707Z  * [new branch]              gh/jansel/571/orig          -> origin/gh/jansel/571/orig
2025-12-04T09:33:41.6267310Z  * [new branch]              gh/jansel/572/base          -> origin/gh/jansel/572/base
2025-12-04T09:33:41.6269037Z  * [new branch]              gh/jansel/572/head          -> origin/gh/jansel/572/head
2025-12-04T09:33:41.6270328Z  * [new branch]              gh/jansel/572/orig          -> origin/gh/jansel/572/orig
2025-12-04T09:33:41.6272137Z  * [new branch]              gh/jansel/573/base          -> origin/gh/jansel/573/base
2025-12-04T09:33:41.6273402Z  * [new branch]              gh/jansel/573/head          -> origin/gh/jansel/573/head
2025-12-04T09:33:41.6274671Z  * [new branch]              gh/jansel/573/orig          -> origin/gh/jansel/573/orig
2025-12-04T09:33:41.6276416Z  * [new branch]              gh/jansel/574/base          -> origin/gh/jansel/574/base
2025-12-04T09:33:41.6277718Z  * [new branch]              gh/jansel/574/head          -> origin/gh/jansel/574/head
2025-12-04T09:33:41.6278981Z  * [new branch]              gh/jansel/574/orig          -> origin/gh/jansel/574/orig
2025-12-04T09:33:41.6280972Z  * [new branch]              gh/jansel/575/base          -> origin/gh/jansel/575/base
2025-12-04T09:33:41.6282334Z  * [new branch]              gh/jansel/575/head          -> origin/gh/jansel/575/head
2025-12-04T09:33:41.6283795Z  * [new branch]              gh/jansel/575/orig          -> origin/gh/jansel/575/orig
2025-12-04T09:33:41.6285607Z  * [new branch]              gh/jansel/576/base          -> origin/gh/jansel/576/base
2025-12-04T09:33:41.6286841Z  * [new branch]              gh/jansel/576/head          -> origin/gh/jansel/576/head
2025-12-04T09:33:41.6288126Z  * [new branch]              gh/jansel/576/orig          -> origin/gh/jansel/576/orig
2025-12-04T09:33:41.6290242Z  * [new branch]              gh/jbschlosser/247/base     -> origin/gh/jbschlosser/247/base
2025-12-04T09:33:41.6292034Z  * [new branch]              gh/jbschlosser/247/head     -> origin/gh/jbschlosser/247/head
2025-12-04T09:33:41.6293315Z  * [new branch]              gh/jbschlosser/247/orig     -> origin/gh/jbschlosser/247/orig
2025-12-04T09:33:41.6295071Z  * [new branch]              gh/jbschlosser/250/base     -> origin/gh/jbschlosser/250/base
2025-12-04T09:33:41.6296277Z  * [new branch]              gh/jbschlosser/250/head     -> origin/gh/jbschlosser/250/head
2025-12-04T09:33:41.6297586Z  * [new branch]              gh/jbschlosser/250/orig     -> origin/gh/jbschlosser/250/orig
2025-12-04T09:33:41.6300217Z  * [new branch]              gh/jerryzh168/1/base        -> origin/gh/jerryzh168/1/base
2025-12-04T09:33:41.6301566Z  * [new branch]              gh/jerryzh168/1/head        -> origin/gh/jerryzh168/1/head
2025-12-04T09:33:41.6302974Z  * [new branch]              gh/jerryzh168/1/orig        -> origin/gh/jerryzh168/1/orig
2025-12-04T09:33:41.6304957Z  * [new branch]              gh/jiayisunx/59/base        -> origin/gh/jiayisunx/59/base
2025-12-04T09:33:41.6306426Z  * [new branch]              gh/jiayisunx/59/head        -> origin/gh/jiayisunx/59/head
2025-12-04T09:33:41.6307750Z  * [new branch]              gh/jiayisunx/59/orig        -> origin/gh/jiayisunx/59/orig
2025-12-04T09:33:41.6309373Z  * [new branch]              gh/jiayisunx/61/base        -> origin/gh/jiayisunx/61/base
2025-12-04T09:33:41.6310667Z  * [new branch]              gh/jiayisunx/61/head        -> origin/gh/jiayisunx/61/head
2025-12-04T09:33:41.6311930Z  * [new branch]              gh/jiayisunx/61/orig        -> origin/gh/jiayisunx/61/orig
2025-12-04T09:33:41.6313686Z  * [new branch]              gh/jiayisunx/68/base        -> origin/gh/jiayisunx/68/base
2025-12-04T09:33:41.6314899Z  * [new branch]              gh/jiayisunx/68/head        -> origin/gh/jiayisunx/68/head
2025-12-04T09:33:41.6316182Z  * [new branch]              gh/jiayisunx/68/orig        -> origin/gh/jiayisunx/68/orig
2025-12-04T09:33:41.6317963Z  * [new branch]              gh/jiayisunx/77/base        -> origin/gh/jiayisunx/77/base
2025-12-04T09:33:41.6319221Z  * [new branch]              gh/jiayisunx/77/head        -> origin/gh/jiayisunx/77/head
2025-12-04T09:33:41.6321030Z  * [new branch]              gh/jiayisunx/77/orig        -> origin/gh/jiayisunx/77/orig
2025-12-04T09:33:41.6322543Z  * [new branch]              gh/jiayisunx/78/base        -> origin/gh/jiayisunx/78/base
2025-12-04T09:33:41.6323866Z  * [new branch]              gh/jiayisunx/78/head        -> origin/gh/jiayisunx/78/head
2025-12-04T09:33:41.6325618Z  * [new branch]              gh/jiayisunx/78/orig        -> origin/gh/jiayisunx/78/orig
2025-12-04T09:33:41.6327295Z  * [new branch]              gh/jiayisunx/79/base        -> origin/gh/jiayisunx/79/base
2025-12-04T09:33:41.6328566Z  * [new branch]              gh/jiayisunx/79/head        -> origin/gh/jiayisunx/79/head
2025-12-04T09:33:41.6329821Z  * [new branch]              gh/jiayisunx/79/orig        -> origin/gh/jiayisunx/79/orig
2025-12-04T09:33:41.6331606Z  * [new branch]              gh/jiayisunx/82/base        -> origin/gh/jiayisunx/82/base
2025-12-04T09:33:41.6332836Z  * [new branch]              gh/jiayisunx/82/head        -> origin/gh/jiayisunx/82/head
2025-12-04T09:33:41.6334148Z  * [new branch]              gh/jiayisunx/82/orig        -> origin/gh/jiayisunx/82/orig
2025-12-04T09:33:41.6335905Z  * [new branch]              gh/jiayisunx/83/base        -> origin/gh/jiayisunx/83/base
2025-12-04T09:33:41.6337290Z  * [new branch]              gh/jiayisunx/83/head        -> origin/gh/jiayisunx/83/head
2025-12-04T09:33:41.6338512Z  * [new branch]              gh/jiayisunx/83/orig        -> origin/gh/jiayisunx/83/orig
2025-12-04T09:33:41.6340585Z  * [new branch]              gh/jiayisunx/84/base        -> origin/gh/jiayisunx/84/base
2025-12-04T09:33:41.6341887Z  * [new branch]              gh/jiayisunx/84/head        -> origin/gh/jiayisunx/84/head
2025-12-04T09:33:41.6343137Z  * [new branch]              gh/jiayisunx/84/orig        -> origin/gh/jiayisunx/84/orig
2025-12-04T09:33:41.6344805Z  * [new branch]              gh/jiayisunx/85/base        -> origin/gh/jiayisunx/85/base
2025-12-04T09:33:41.6346035Z  * [new branch]              gh/jiayisunx/85/head        -> origin/gh/jiayisunx/85/head
2025-12-04T09:33:41.6347314Z  * [new branch]              gh/jiayisunx/85/orig        -> origin/gh/jiayisunx/85/orig
2025-12-04T09:33:41.6348958Z  * [new branch]              gh/jiayisunx/86/base        -> origin/gh/jiayisunx/86/base
2025-12-04T09:33:41.6350193Z  * [new branch]              gh/jiayisunx/86/head        -> origin/gh/jiayisunx/86/head
2025-12-04T09:33:41.6351839Z  * [new branch]              gh/jiayisunx/86/orig        -> origin/gh/jiayisunx/86/orig
2025-12-04T09:33:41.6353433Z  * [new branch]              gh/jiayisunx/87/base        -> origin/gh/jiayisunx/87/base
2025-12-04T09:33:41.6354740Z  * [new branch]              gh/jiayisunx/87/head        -> origin/gh/jiayisunx/87/head
2025-12-04T09:33:41.6355984Z  * [new branch]              gh/jiayisunx/87/orig        -> origin/gh/jiayisunx/87/orig
2025-12-04T09:33:41.6357639Z  * [new branch]              gh/jiayisunx/88/base        -> origin/gh/jiayisunx/88/base
2025-12-04T09:33:41.6358917Z  * [new branch]              gh/jiayisunx/88/head        -> origin/gh/jiayisunx/88/head
2025-12-04T09:33:41.6360201Z  * [new branch]              gh/jiayisunx/88/orig        -> origin/gh/jiayisunx/88/orig
2025-12-04T09:33:41.6361877Z  * [new branch]              gh/jiayisunx/89/base        -> origin/gh/jiayisunx/89/base
2025-12-04T09:33:41.6363244Z  * [new branch]              gh/jiayisunx/89/head        -> origin/gh/jiayisunx/89/head
2025-12-04T09:33:41.6364517Z  * [new branch]              gh/jiayisunx/89/orig        -> origin/gh/jiayisunx/89/orig
2025-12-04T09:33:41.6366149Z  * [new branch]              gh/jiayisunx/90/base        -> origin/gh/jiayisunx/90/base
2025-12-04T09:33:41.6367391Z  * [new branch]              gh/jiayisunx/90/head        -> origin/gh/jiayisunx/90/head
2025-12-04T09:33:41.6368663Z  * [new branch]              gh/jiayisunx/90/orig        -> origin/gh/jiayisunx/90/orig
2025-12-04T09:33:41.6370673Z  * [new branch]              gh/jjwu@meta.com/1/base     -> origin/gh/jjwu@meta.com/1/base
2025-12-04T09:33:41.6371918Z  * [new branch]              gh/jjwu@meta.com/1/head     -> origin/gh/jjwu@meta.com/1/head
2025-12-04T09:33:41.6373886Z  * [new branch]              gh/jturney/1/base           -> origin/gh/jturney/1/base
2025-12-04T09:33:41.6375178Z  * [new branch]              gh/jturney/1/head           -> origin/gh/jturney/1/head
2025-12-04T09:33:41.6376455Z  * [new branch]              gh/jturney/1/orig           -> origin/gh/jturney/1/orig
2025-12-04T09:33:41.6378139Z  * [new branch]              gh/jturney/2/base           -> origin/gh/jturney/2/base
2025-12-04T09:33:41.6379415Z  * [new branch]              gh/jturney/2/head           -> origin/gh/jturney/2/head
2025-12-04T09:33:41.6380674Z  * [new branch]              gh/jturney/2/orig           -> origin/gh/jturney/2/orig
2025-12-04T09:33:41.6382992Z  * [new branch]              gh/karthickai/10/base       -> origin/gh/karthickai/10/base
2025-12-04T09:33:41.6384482Z  * [new branch]              gh/karthickai/10/head       -> origin/gh/karthickai/10/head
2025-12-04T09:33:41.6385773Z  * [new branch]              gh/karthickai/10/orig       -> origin/gh/karthickai/10/orig
2025-12-04T09:33:41.6387943Z  * [new branch]              gh/karthickai/11/base       -> origin/gh/karthickai/11/base
2025-12-04T09:33:41.6389318Z  * [new branch]              gh/karthickai/11/head       -> origin/gh/karthickai/11/head
2025-12-04T09:33:41.6390686Z  * [new branch]              gh/karthickai/11/orig       -> origin/gh/karthickai/11/orig
2025-12-04T09:33:41.6392857Z  * [new branch]              gh/karthickai/12/base       -> origin/gh/karthickai/12/base
2025-12-04T09:33:41.6394213Z  * [new branch]              gh/karthickai/12/head       -> origin/gh/karthickai/12/head
2025-12-04T09:33:41.6395500Z  * [new branch]              gh/karthickai/12/orig       -> origin/gh/karthickai/12/orig
2025-12-04T09:33:41.6397269Z  * [new branch]              gh/karthickai/13/base       -> origin/gh/karthickai/13/base
2025-12-04T09:33:41.6398652Z  * [new branch]              gh/karthickai/13/head       -> origin/gh/karthickai/13/head
2025-12-04T09:33:41.6399948Z  * [new branch]              gh/karthickai/13/orig       -> origin/gh/karthickai/13/orig
2025-12-04T09:33:41.6405130Z  * [new branch]              gh/karthickai/14/base       -> origin/gh/karthickai/14/base
2025-12-04T09:33:41.6406720Z  * [new branch]              gh/karthickai/14/head       -> origin/gh/karthickai/14/head
2025-12-04T09:33:41.6408170Z  * [new branch]              gh/karthickai/14/orig       -> origin/gh/karthickai/14/orig
2025-12-04T09:33:41.6410080Z  * [new branch]              gh/karthickai/15/base       -> origin/gh/karthickai/15/base
2025-12-04T09:33:41.6411422Z  * [new branch]              gh/karthickai/15/head       -> origin/gh/karthickai/15/head
2025-12-04T09:33:41.6413128Z  * [new branch]              gh/karthickai/15/orig       -> origin/gh/karthickai/15/orig
2025-12-04T09:33:41.6414797Z  * [new branch]              gh/karthickai/16/base       -> origin/gh/karthickai/16/base
2025-12-04T09:33:41.6416117Z  * [new branch]              gh/karthickai/16/head       -> origin/gh/karthickai/16/head
2025-12-04T09:33:41.6417393Z  * [new branch]              gh/karthickai/16/orig       -> origin/gh/karthickai/16/orig
2025-12-04T09:33:41.6419006Z  * [new branch]              gh/karthickai/17/base       -> origin/gh/karthickai/17/base
2025-12-04T09:33:41.6420202Z  * [new branch]              gh/karthickai/17/head       -> origin/gh/karthickai/17/head
2025-12-04T09:33:41.6421460Z  * [new branch]              gh/karthickai/17/orig       -> origin/gh/karthickai/17/orig
2025-12-04T09:33:41.6423327Z  * [new branch]              gh/karthickai/18/base       -> origin/gh/karthickai/18/base
2025-12-04T09:33:41.6424904Z  * [new branch]              gh/karthickai/18/head       -> origin/gh/karthickai/18/head
2025-12-04T09:33:41.6426575Z  * [new branch]              gh/karthickai/18/orig       -> origin/gh/karthickai/18/orig
2025-12-04T09:33:41.6428389Z  * [new branch]              gh/karthickai/19/base       -> origin/gh/karthickai/19/base
2025-12-04T09:33:41.6429725Z  * [new branch]              gh/karthickai/19/head       -> origin/gh/karthickai/19/head
2025-12-04T09:33:41.6431004Z  * [new branch]              gh/karthickai/19/orig       -> origin/gh/karthickai/19/orig
2025-12-04T09:33:41.6433648Z  * [new branch]              gh/karthickai/20/base       -> origin/gh/karthickai/20/base
2025-12-04T09:33:41.6435477Z  * [new branch]              gh/karthickai/20/head       -> origin/gh/karthickai/20/head
2025-12-04T09:33:41.6436828Z  * [new branch]              gh/karthickai/20/orig       -> origin/gh/karthickai/20/orig
2025-12-04T09:33:41.6438638Z  * [new branch]              gh/karthickai/21/base       -> origin/gh/karthickai/21/base
2025-12-04T09:33:41.6440135Z  * [new branch]              gh/karthickai/21/head       -> origin/gh/karthickai/21/head
2025-12-04T09:33:41.6441564Z  * [new branch]              gh/karthickai/21/orig       -> origin/gh/karthickai/21/orig
2025-12-04T09:33:41.6443591Z  * [new branch]              gh/karthickai/22/base       -> origin/gh/karthickai/22/base
2025-12-04T09:33:41.6444913Z  * [new branch]              gh/karthickai/22/head       -> origin/gh/karthickai/22/head
2025-12-04T09:33:41.6446175Z  * [new branch]              gh/karthickai/22/orig       -> origin/gh/karthickai/22/orig
2025-12-04T09:33:41.6448065Z  * [new branch]              gh/karthickai/23/base       -> origin/gh/karthickai/23/base
2025-12-04T09:33:41.6449570Z  * [new branch]              gh/karthickai/23/head       -> origin/gh/karthickai/23/head
2025-12-04T09:33:41.6450838Z  * [new branch]              gh/karthickai/23/orig       -> origin/gh/karthickai/23/orig
2025-12-04T09:33:41.6452595Z  * [new branch]              gh/karthickai/24/base       -> origin/gh/karthickai/24/base
2025-12-04T09:33:41.6453898Z  * [new branch]              gh/karthickai/24/head       -> origin/gh/karthickai/24/head
2025-12-04T09:33:41.6455174Z  * [new branch]              gh/karthickai/24/orig       -> origin/gh/karthickai/24/orig
2025-12-04T09:33:41.6457460Z  * [new branch]              gh/karthickai/25/base       -> origin/gh/karthickai/25/base
2025-12-04T09:33:41.6458868Z  * [new branch]              gh/karthickai/25/head       -> origin/gh/karthickai/25/head
2025-12-04T09:33:41.6460152Z  * [new branch]              gh/karthickai/25/orig       -> origin/gh/karthickai/25/orig
2025-12-04T09:33:41.6461756Z  * [new branch]              gh/karthickai/26/base       -> origin/gh/karthickai/26/base
2025-12-04T09:33:41.6463354Z  * [new branch]              gh/karthickai/26/head       -> origin/gh/karthickai/26/head
2025-12-04T09:33:41.6464521Z  * [new branch]              gh/karthickai/26/orig       -> origin/gh/karthickai/26/orig
2025-12-04T09:33:41.6467909Z  * [new branch]              gh/karthickai/6/base        -> origin/gh/karthickai/6/base
2025-12-04T09:33:41.6469931Z  * [new branch]              gh/karthickai/6/head        -> origin/gh/karthickai/6/head
2025-12-04T09:33:41.6471772Z  * [new branch]              gh/karthickai/6/orig        -> origin/gh/karthickai/6/orig
2025-12-04T09:33:41.6473907Z  * [new branch]              gh/krocki/1/base            -> origin/gh/krocki/1/base
2025-12-04T09:33:41.6475187Z  * [new branch]              gh/krocki/1/head            -> origin/gh/krocki/1/head
2025-12-04T09:33:41.6476498Z  * [new branch]              gh/krocki/1/orig            -> origin/gh/krocki/1/orig
2025-12-04T09:33:41.6478721Z  * [new branch]              gh/krocki/2/base            -> origin/gh/krocki/2/base
2025-12-04T09:33:41.6480007Z  * [new branch]              gh/krocki/2/head            -> origin/gh/krocki/2/head
2025-12-04T09:33:41.6481304Z  * [new branch]              gh/krocki/2/orig            -> origin/gh/krocki/2/orig
2025-12-04T09:33:41.6483615Z  * [new branch]              gh/kurtamohler/60/base      -> origin/gh/kurtamohler/60/base
2025-12-04T09:33:41.6484901Z  * [new branch]              gh/kurtamohler/60/head      -> origin/gh/kurtamohler/60/head
2025-12-04T09:33:41.6486139Z  * [new branch]              gh/kurtamohler/60/orig      -> origin/gh/kurtamohler/60/orig
2025-12-04T09:33:41.6487907Z  * [new branch]              gh/kurtamohler/61/base      -> origin/gh/kurtamohler/61/base
2025-12-04T09:33:41.6489141Z  * [new branch]              gh/kurtamohler/61/head      -> origin/gh/kurtamohler/61/head
2025-12-04T09:33:41.6490413Z  * [new branch]              gh/kurtamohler/61/orig      -> origin/gh/kurtamohler/61/orig
2025-12-04T09:33:41.6492133Z  * [new branch]              gh/kurtamohler/62/base      -> origin/gh/kurtamohler/62/base
2025-12-04T09:33:41.6493390Z  * [new branch]              gh/kurtamohler/62/head      -> origin/gh/kurtamohler/62/head
2025-12-04T09:33:41.6494646Z  * [new branch]              gh/kurtamohler/62/orig      -> origin/gh/kurtamohler/62/orig
2025-12-04T09:33:41.6496319Z  * [new branch]              gh/kurtamohler/63/base      -> origin/gh/kurtamohler/63/base
2025-12-04T09:33:41.6497605Z  * [new branch]              gh/kurtamohler/63/head      -> origin/gh/kurtamohler/63/head
2025-12-04T09:33:41.6498884Z  * [new branch]              gh/kurtamohler/63/orig      -> origin/gh/kurtamohler/63/orig
2025-12-04T09:33:41.6500726Z  * [new branch]              gh/kurtamohler/64/base      -> origin/gh/kurtamohler/64/base
2025-12-04T09:33:41.6502310Z  * [new branch]              gh/kurtamohler/64/head      -> origin/gh/kurtamohler/64/head
2025-12-04T09:33:41.6503669Z  * [new branch]              gh/kurtamohler/64/orig      -> origin/gh/kurtamohler/64/orig
2025-12-04T09:33:41.6505455Z  * [new branch]              gh/kurtamohler/65/base      -> origin/gh/kurtamohler/65/base
2025-12-04T09:33:41.6506672Z  * [new branch]              gh/kurtamohler/65/head      -> origin/gh/kurtamohler/65/head
2025-12-04T09:33:41.6507938Z  * [new branch]              gh/kurtamohler/65/orig      -> origin/gh/kurtamohler/65/orig
2025-12-04T09:33:41.6509609Z  * [new branch]              gh/kurtamohler/66/base      -> origin/gh/kurtamohler/66/base
2025-12-04T09:33:41.6510927Z  * [new branch]              gh/kurtamohler/66/head      -> origin/gh/kurtamohler/66/head
2025-12-04T09:33:41.6512313Z  * [new branch]              gh/kurtamohler/66/orig      -> origin/gh/kurtamohler/66/orig
2025-12-04T09:33:41.6513928Z  * [new branch]              gh/kurtamohler/67/base      -> origin/gh/kurtamohler/67/base
2025-12-04T09:33:41.6515199Z  * [new branch]              gh/kurtamohler/67/head      -> origin/gh/kurtamohler/67/head
2025-12-04T09:33:41.6516590Z  * [new branch]              gh/kurtamohler/67/orig      -> origin/gh/kurtamohler/67/orig
2025-12-04T09:33:41.6518803Z  * [new branch]              gh/kwen2501/130/base        -> origin/gh/kwen2501/130/base
2025-12-04T09:33:41.6520226Z  * [new branch]              gh/kwen2501/130/head        -> origin/gh/kwen2501/130/head
2025-12-04T09:33:41.6521588Z  * [new branch]              gh/kwen2501/130/orig        -> origin/gh/kwen2501/130/orig
2025-12-04T09:33:41.6523482Z  * [new branch]              gh/kwen2501/170/base        -> origin/gh/kwen2501/170/base
2025-12-04T09:33:41.6524774Z  * [new branch]              gh/kwen2501/170/head        -> origin/gh/kwen2501/170/head
2025-12-04T09:33:41.6526545Z  * [new branch]              gh/kwen2501/187/base        -> origin/gh/kwen2501/187/base
2025-12-04T09:33:41.6527875Z  * [new branch]              gh/kwen2501/187/head        -> origin/gh/kwen2501/187/head
2025-12-04T09:33:41.6529193Z  * [new branch]              gh/kwen2501/187/orig        -> origin/gh/kwen2501/187/orig
2025-12-04T09:33:41.6530876Z  * [new branch]              gh/kwen2501/188/base        -> origin/gh/kwen2501/188/base
2025-12-04T09:33:41.6532184Z  * [new branch]              gh/kwen2501/188/head        -> origin/gh/kwen2501/188/head
2025-12-04T09:33:41.6533973Z  * [new branch]              gh/kwen2501/188/orig        -> origin/gh/kwen2501/188/orig
2025-12-04T09:33:41.6535766Z  * [new branch]              gh/kwen2501/211/base        -> origin/gh/kwen2501/211/base
2025-12-04T09:33:41.6537018Z  * [new branch]              gh/kwen2501/211/head        -> origin/gh/kwen2501/211/head
2025-12-04T09:33:41.6538856Z  * [new branch]              gh/kwen2501/224/base        -> origin/gh/kwen2501/224/base
2025-12-04T09:33:41.6540114Z  * [new branch]              gh/kwen2501/224/head        -> origin/gh/kwen2501/224/head
2025-12-04T09:33:41.6541395Z  * [new branch]              gh/kwen2501/224/orig        -> origin/gh/kwen2501/224/orig
2025-12-04T09:33:41.6543064Z  * [new branch]              gh/kwen2501/228/base        -> origin/gh/kwen2501/228/base
2025-12-04T09:33:41.6544389Z  * [new branch]              gh/kwen2501/228/head        -> origin/gh/kwen2501/228/head
2025-12-04T09:33:41.6545647Z  * [new branch]              gh/kwen2501/228/orig        -> origin/gh/kwen2501/228/orig
2025-12-04T09:33:41.6547545Z  * [new branch]              gh/kwen2501/234/base        -> origin/gh/kwen2501/234/base
2025-12-04T09:33:41.6548787Z  * [new branch]              gh/kwen2501/234/head        -> origin/gh/kwen2501/234/head
2025-12-04T09:33:41.6550043Z  * [new branch]              gh/kwen2501/234/orig        -> origin/gh/kwen2501/234/orig
2025-12-04T09:33:41.6551805Z  * [new branch]              gh/kwen2501/235/base        -> origin/gh/kwen2501/235/base
2025-12-04T09:33:41.6553058Z  * [new branch]              gh/kwen2501/235/head        -> origin/gh/kwen2501/235/head
2025-12-04T09:33:41.6554329Z  * [new branch]              gh/kwen2501/235/orig        -> origin/gh/kwen2501/235/orig
2025-12-04T09:33:41.6555938Z  * [new branch]              gh/kwen2501/236/base        -> origin/gh/kwen2501/236/base
2025-12-04T09:33:41.6557304Z  * [new branch]              gh/kwen2501/236/head        -> origin/gh/kwen2501/236/head
2025-12-04T09:33:41.6558530Z  * [new branch]              gh/kwen2501/236/orig        -> origin/gh/kwen2501/236/orig
2025-12-04T09:33:41.6560224Z  * [new branch]              gh/kwen2501/237/base        -> origin/gh/kwen2501/237/base
2025-12-04T09:33:41.6561501Z  * [new branch]              gh/kwen2501/237/head        -> origin/gh/kwen2501/237/head
2025-12-04T09:33:41.6562894Z  * [new branch]              gh/kwen2501/237/orig        -> origin/gh/kwen2501/237/orig
2025-12-04T09:33:41.6564596Z  * [new branch]              gh/kwen2501/238/base        -> origin/gh/kwen2501/238/base
2025-12-04T09:33:41.6565796Z  * [new branch]              gh/kwen2501/238/head        -> origin/gh/kwen2501/238/head
2025-12-04T09:33:41.6567094Z  * [new branch]              gh/kwen2501/238/orig        -> origin/gh/kwen2501/238/orig
2025-12-04T09:33:41.6568959Z  * [new branch]              gh/kwen2501/240/base        -> origin/gh/kwen2501/240/base
2025-12-04T09:33:41.6570107Z  * [new branch]              gh/kwen2501/240/head        -> origin/gh/kwen2501/240/head
2025-12-04T09:33:41.6571381Z  * [new branch]              gh/kwen2501/240/orig        -> origin/gh/kwen2501/240/orig
2025-12-04T09:33:41.6572987Z  * [new branch]              gh/kwen2501/241/base        -> origin/gh/kwen2501/241/base
2025-12-04T09:33:41.6574278Z  * [new branch]              gh/kwen2501/241/head        -> origin/gh/kwen2501/241/head
2025-12-04T09:33:41.6575520Z  * [new branch]              gh/kwen2501/241/orig        -> origin/gh/kwen2501/241/orig
2025-12-04T09:33:41.6577193Z  * [new branch]              gh/kwen2501/247/base        -> origin/gh/kwen2501/247/base
2025-12-04T09:33:41.6578428Z  * [new branch]              gh/kwen2501/247/head        -> origin/gh/kwen2501/247/head
2025-12-04T09:33:41.6579963Z  * [new branch]              gh/kwen2501/247/orig        -> origin/gh/kwen2501/247/orig
2025-12-04T09:33:41.6581412Z  * [new branch]              gh/kwen2501/252/base        -> origin/gh/kwen2501/252/base
2025-12-04T09:33:41.6582623Z  * [new branch]              gh/kwen2501/252/head        -> origin/gh/kwen2501/252/head
2025-12-04T09:33:41.6583923Z  * [new branch]              gh/kwen2501/252/orig        -> origin/gh/kwen2501/252/orig
2025-12-04T09:33:41.6586209Z  * [new branch]              gh/kwen2501/259/base        -> origin/gh/kwen2501/259/base
2025-12-04T09:33:41.6587609Z  * [new branch]              gh/kwen2501/259/head        -> origin/gh/kwen2501/259/head
2025-12-04T09:33:41.6588889Z  * [new branch]              gh/kwen2501/259/orig        -> origin/gh/kwen2501/259/orig
2025-12-04T09:33:41.6590748Z  * [new branch]              gh/kwen2501/260/base        -> origin/gh/kwen2501/260/base
2025-12-04T09:33:41.6592174Z  * [new branch]              gh/kwen2501/260/head        -> origin/gh/kwen2501/260/head
2025-12-04T09:33:41.6593404Z  * [new branch]              gh/kwen2501/260/orig        -> origin/gh/kwen2501/260/orig
2025-12-04T09:33:41.6595131Z  * [new branch]              gh/kwen2501/268/base        -> origin/gh/kwen2501/268/base
2025-12-04T09:33:41.6596408Z  * [new branch]              gh/kwen2501/268/head        -> origin/gh/kwen2501/268/head
2025-12-04T09:33:41.6597635Z  * [new branch]              gh/kwen2501/268/orig        -> origin/gh/kwen2501/268/orig
2025-12-04T09:33:41.6599466Z  * [new branch]              gh/kwen2501/269/base        -> origin/gh/kwen2501/269/base
2025-12-04T09:33:41.6600807Z  * [new branch]              gh/kwen2501/269/head        -> origin/gh/kwen2501/269/head
2025-12-04T09:33:41.6602465Z  * [new branch]              gh/kwen2501/269/orig        -> origin/gh/kwen2501/269/orig
2025-12-04T09:33:41.6604433Z  * [new branch]              gh/kwen2501/270/base        -> origin/gh/kwen2501/270/base
2025-12-04T09:33:41.6605857Z  * [new branch]              gh/kwen2501/270/head        -> origin/gh/kwen2501/270/head
2025-12-04T09:33:41.6607122Z  * [new branch]              gh/kwen2501/270/orig        -> origin/gh/kwen2501/270/orig
2025-12-04T09:33:41.6608928Z  * [new branch]              gh/kwen2501/271/base        -> origin/gh/kwen2501/271/base
2025-12-04T09:33:41.6610237Z  * [new branch]              gh/kwen2501/271/head        -> origin/gh/kwen2501/271/head
2025-12-04T09:33:41.6611553Z  * [new branch]              gh/kwen2501/271/orig        -> origin/gh/kwen2501/271/orig
2025-12-04T09:33:41.6613391Z  * [new branch]              gh/kwen2501/274/base        -> origin/gh/kwen2501/274/base
2025-12-04T09:33:41.6614857Z  * [new branch]              gh/kwen2501/274/head        -> origin/gh/kwen2501/274/head
2025-12-04T09:33:41.6616138Z  * [new branch]              gh/kwen2501/274/orig        -> origin/gh/kwen2501/274/orig
2025-12-04T09:33:41.6618458Z  * [new branch]              gh/kwen2501/275/base        -> origin/gh/kwen2501/275/base
2025-12-04T09:33:41.6619895Z  * [new branch]              gh/kwen2501/275/head        -> origin/gh/kwen2501/275/head
2025-12-04T09:33:41.6621324Z  * [new branch]              gh/kwen2501/275/orig        -> origin/gh/kwen2501/275/orig
2025-12-04T09:33:41.6623064Z  * [new branch]              gh/kwen2501/276/base        -> origin/gh/kwen2501/276/base
2025-12-04T09:33:41.6624353Z  * [new branch]              gh/kwen2501/276/head        -> origin/gh/kwen2501/276/head
2025-12-04T09:33:41.6625610Z  * [new branch]              gh/kwen2501/276/orig        -> origin/gh/kwen2501/276/orig
2025-12-04T09:33:41.6627380Z  * [new branch]              gh/kwen2501/277/base        -> origin/gh/kwen2501/277/base
2025-12-04T09:33:41.6628639Z  * [new branch]              gh/kwen2501/277/head        -> origin/gh/kwen2501/277/head
2025-12-04T09:33:41.6629909Z  * [new branch]              gh/kwen2501/277/orig        -> origin/gh/kwen2501/277/orig
2025-12-04T09:33:41.6631675Z  * [new branch]              gh/kwen2501/278/base        -> origin/gh/kwen2501/278/base
2025-12-04T09:33:41.6632970Z  * [new branch]              gh/kwen2501/278/head        -> origin/gh/kwen2501/278/head
2025-12-04T09:33:41.6634264Z  * [new branch]              gh/kwen2501/278/orig        -> origin/gh/kwen2501/278/orig
2025-12-04T09:33:41.6636113Z  * [new branch]              gh/kwen2501/279/base        -> origin/gh/kwen2501/279/base
2025-12-04T09:33:41.6637521Z  * [new branch]              gh/kwen2501/279/head        -> origin/gh/kwen2501/279/head
2025-12-04T09:33:41.6638877Z  * [new branch]              gh/kwen2501/279/orig        -> origin/gh/kwen2501/279/orig
2025-12-04T09:33:41.6640797Z  * [new branch]              gh/kwen2501/280/base        -> origin/gh/kwen2501/280/base
2025-12-04T09:33:41.6642138Z  * [new branch]              gh/kwen2501/280/head        -> origin/gh/kwen2501/280/head
2025-12-04T09:33:41.6643584Z  * [new branch]              gh/kwen2501/280/orig        -> origin/gh/kwen2501/280/orig
2025-12-04T09:33:41.6645364Z  * [new branch]              gh/kwen2501/281/base        -> origin/gh/kwen2501/281/base
2025-12-04T09:33:41.6646669Z  * [new branch]              gh/kwen2501/281/head        -> origin/gh/kwen2501/281/head
2025-12-04T09:33:41.6647955Z  * [new branch]              gh/kwen2501/281/orig        -> origin/gh/kwen2501/281/orig
2025-12-04T09:33:41.6649705Z  * [new branch]              gh/kwen2501/282/base        -> origin/gh/kwen2501/282/base
2025-12-04T09:33:41.6651037Z  * [new branch]              gh/kwen2501/282/head        -> origin/gh/kwen2501/282/head
2025-12-04T09:33:41.6652336Z  * [new branch]              gh/kwen2501/282/orig        -> origin/gh/kwen2501/282/orig
2025-12-04T09:33:41.6654063Z  * [new branch]              gh/kwen2501/283/base        -> origin/gh/kwen2501/283/base
2025-12-04T09:33:41.6655517Z  * [new branch]              gh/kwen2501/283/head        -> origin/gh/kwen2501/283/head
2025-12-04T09:33:41.6656765Z  * [new branch]              gh/kwen2501/283/orig        -> origin/gh/kwen2501/283/orig
2025-12-04T09:33:41.6658634Z  * [new branch]              gh/kwen2501/284/base        -> origin/gh/kwen2501/284/base
2025-12-04T09:33:41.6659973Z  * [new branch]              gh/kwen2501/284/head        -> origin/gh/kwen2501/284/head
2025-12-04T09:33:41.6661330Z  * [new branch]              gh/kwen2501/284/orig        -> origin/gh/kwen2501/284/orig
2025-12-04T09:33:41.6663064Z  * [new branch]              gh/kwen2501/285/base        -> origin/gh/kwen2501/285/base
2025-12-04T09:33:41.6664284Z  * [new branch]              gh/kwen2501/285/head        -> origin/gh/kwen2501/285/head
2025-12-04T09:33:41.6665602Z  * [new branch]              gh/kwen2501/285/orig        -> origin/gh/kwen2501/285/orig
2025-12-04T09:33:41.6667320Z  * [new branch]              gh/kwen2501/286/base        -> origin/gh/kwen2501/286/base
2025-12-04T09:33:41.6668661Z  * [new branch]              gh/kwen2501/286/head        -> origin/gh/kwen2501/286/head
2025-12-04T09:33:41.6669933Z  * [new branch]              gh/kwen2501/286/orig        -> origin/gh/kwen2501/286/orig
2025-12-04T09:33:41.6671533Z  * [new branch]              gh/kwen2501/287/base        -> origin/gh/kwen2501/287/base
2025-12-04T09:33:41.6672895Z  * [new branch]              gh/kwen2501/287/head        -> origin/gh/kwen2501/287/head
2025-12-04T09:33:41.6674133Z  * [new branch]              gh/kwen2501/287/orig        -> origin/gh/kwen2501/287/orig
2025-12-04T09:33:41.6676022Z  * [new branch]              gh/kwen2501/288/base        -> origin/gh/kwen2501/288/base
2025-12-04T09:33:41.6677347Z  * [new branch]              gh/kwen2501/288/head        -> origin/gh/kwen2501/288/head
2025-12-04T09:33:41.6678631Z  * [new branch]              gh/kwen2501/288/orig        -> origin/gh/kwen2501/288/orig
2025-12-04T09:33:41.6680618Z  * [new branch]              gh/laithsakka/251/base      -> origin/gh/laithsakka/251/base
2025-12-04T09:33:41.6681904Z  * [new branch]              gh/laithsakka/251/head      -> origin/gh/laithsakka/251/head
2025-12-04T09:33:41.6683420Z  * [new branch]              gh/laithsakka/251/orig      -> origin/gh/laithsakka/251/orig
2025-12-04T09:33:41.6685132Z  * [new branch]              gh/laithsakka/276/base      -> origin/gh/laithsakka/276/base
2025-12-04T09:33:41.6686394Z  * [new branch]              gh/laithsakka/276/head      -> origin/gh/laithsakka/276/head
2025-12-04T09:33:41.6687674Z  * [new branch]              gh/laithsakka/276/orig      -> origin/gh/laithsakka/276/orig
2025-12-04T09:33:41.6689448Z  * [new branch]              gh/laithsakka/28/base       -> origin/gh/laithsakka/28/base
2025-12-04T09:33:41.6690972Z  * [new branch]              gh/laithsakka/29/base       -> origin/gh/laithsakka/29/base
2025-12-04T09:33:41.6692534Z  * [new branch]              gh/laithsakka/30/base       -> origin/gh/laithsakka/30/base
2025-12-04T09:33:41.6693850Z  * [new branch]              gh/laithsakka/30/head       -> origin/gh/laithsakka/30/head
2025-12-04T09:33:41.6695539Z  * [new branch]              gh/laithsakka/31/base       -> origin/gh/laithsakka/31/base
2025-12-04T09:33:41.6696805Z  * [new branch]              gh/laithsakka/31/head       -> origin/gh/laithsakka/31/head
2025-12-04T09:33:41.6698608Z  * [new branch]              gh/laithsakka/313/base      -> origin/gh/laithsakka/313/base
2025-12-04T09:33:41.6699831Z  * [new branch]              gh/laithsakka/313/head      -> origin/gh/laithsakka/313/head
2025-12-04T09:33:41.6701347Z  * [new branch]              gh/laithsakka/313/orig      -> origin/gh/laithsakka/313/orig
2025-12-04T09:33:41.6703534Z  * [new branch]              gh/laithsakka/316/base      -> origin/gh/laithsakka/316/base
2025-12-04T09:33:41.6704758Z  * [new branch]              gh/laithsakka/316/head      -> origin/gh/laithsakka/316/head
2025-12-04T09:33:41.6706007Z  * [new branch]              gh/laithsakka/316/orig      -> origin/gh/laithsakka/316/orig
2025-12-04T09:33:41.6707759Z  * [new branch]              gh/laithsakka/317/base      -> origin/gh/laithsakka/317/base
2025-12-04T09:33:41.6709029Z  * [new branch]              gh/laithsakka/317/head      -> origin/gh/laithsakka/317/head
2025-12-04T09:33:41.6710248Z  * [new branch]              gh/laithsakka/317/orig      -> origin/gh/laithsakka/317/orig
2025-12-04T09:33:41.6711986Z  * [new branch]              gh/laithsakka/319/base      -> origin/gh/laithsakka/319/base
2025-12-04T09:33:41.6713349Z  * [new branch]              gh/laithsakka/319/head      -> origin/gh/laithsakka/319/head
2025-12-04T09:33:41.6714644Z  * [new branch]              gh/laithsakka/319/orig      -> origin/gh/laithsakka/319/orig
2025-12-04T09:33:41.6716722Z  * [new branch]              gh/laithsakka/32/base       -> origin/gh/laithsakka/32/base
2025-12-04T09:33:41.6717949Z  * [new branch]              gh/laithsakka/32/head       -> origin/gh/laithsakka/32/head
2025-12-04T09:33:41.6719784Z  * [new branch]              gh/laithsakka/320/base      -> origin/gh/laithsakka/320/base
2025-12-04T09:33:41.6721028Z  * [new branch]              gh/laithsakka/320/head      -> origin/gh/laithsakka/320/head
2025-12-04T09:33:41.6722329Z  * [new branch]              gh/laithsakka/320/orig      -> origin/gh/laithsakka/320/orig
2025-12-04T09:33:41.6724077Z  * [new branch]              gh/laithsakka/321/base      -> origin/gh/laithsakka/321/base
2025-12-04T09:33:41.6725477Z  * [new branch]              gh/laithsakka/321/head      -> origin/gh/laithsakka/321/head
2025-12-04T09:33:41.6726763Z  * [new branch]              gh/laithsakka/321/orig      -> origin/gh/laithsakka/321/orig
2025-12-04T09:33:41.6728703Z  * [new branch]              gh/laithsakka/322/base      -> origin/gh/laithsakka/322/base
2025-12-04T09:33:41.6730029Z  * [new branch]              gh/laithsakka/322/head      -> origin/gh/laithsakka/322/head
2025-12-04T09:33:41.6731307Z  * [new branch]              gh/laithsakka/322/orig      -> origin/gh/laithsakka/322/orig
2025-12-04T09:33:41.6733087Z  * [new branch]              gh/laithsakka/323/base      -> origin/gh/laithsakka/323/base
2025-12-04T09:33:41.6734453Z  * [new branch]              gh/laithsakka/323/head      -> origin/gh/laithsakka/323/head
2025-12-04T09:33:41.6735794Z  * [new branch]              gh/laithsakka/323/orig      -> origin/gh/laithsakka/323/orig
2025-12-04T09:33:41.6737591Z  * [new branch]              gh/laithsakka/324/base      -> origin/gh/laithsakka/324/base
2025-12-04T09:33:41.6738826Z  * [new branch]              gh/laithsakka/324/head      -> origin/gh/laithsakka/324/head
2025-12-04T09:33:41.6740031Z  * [new branch]              gh/laithsakka/324/orig      -> origin/gh/laithsakka/324/orig
2025-12-04T09:33:41.6741860Z  * [new branch]              gh/laithsakka/325/base      -> origin/gh/laithsakka/325/base
2025-12-04T09:33:41.6743178Z  * [new branch]              gh/laithsakka/325/head      -> origin/gh/laithsakka/325/head
2025-12-04T09:33:41.6744561Z  * [new branch]              gh/laithsakka/325/orig      -> origin/gh/laithsakka/325/orig
2025-12-04T09:33:41.6746592Z  * [new branch]              gh/laithsakka/326/base      -> origin/gh/laithsakka/326/base
2025-12-04T09:33:41.6747899Z  * [new branch]              gh/laithsakka/326/head      -> origin/gh/laithsakka/326/head
2025-12-04T09:33:41.6749221Z  * [new branch]              gh/laithsakka/326/orig      -> origin/gh/laithsakka/326/orig
2025-12-04T09:33:41.6751007Z  * [new branch]              gh/laithsakka/327/base      -> origin/gh/laithsakka/327/base
2025-12-04T09:33:41.6752336Z  * [new branch]              gh/laithsakka/327/head      -> origin/gh/laithsakka/327/head
2025-12-04T09:33:41.6753718Z  * [new branch]              gh/laithsakka/327/orig      -> origin/gh/laithsakka/327/orig
2025-12-04T09:33:41.6755449Z  * [new branch]              gh/laithsakka/328/base      -> origin/gh/laithsakka/328/base
2025-12-04T09:33:41.6756722Z  * [new branch]              gh/laithsakka/328/head      -> origin/gh/laithsakka/328/head
2025-12-04T09:33:41.6757967Z  * [new branch]              gh/laithsakka/328/orig      -> origin/gh/laithsakka/328/orig
2025-12-04T09:33:41.6760010Z  * [new branch]              gh/liangel/4/base           -> origin/gh/liangel/4/base
2025-12-04T09:33:41.6761401Z  * [new branch]              gh/liangel/4/head           -> origin/gh/liangel/4/head
2025-12-04T09:33:41.6762773Z  * [new branch]              gh/liangel/4/orig           -> origin/gh/liangel/4/orig
2025-12-04T09:33:41.6767048Z  * [new branch]              gh/lucaskabela/1/base       -> origin/gh/lucaskabela/1/base
2025-12-04T09:33:41.6768578Z  * [new branch]              gh/lucaskabela/1/head       -> origin/gh/lucaskabela/1/head
2025-12-04T09:33:41.6770642Z  * [new branch]              gh/lw/4/base                -> origin/gh/lw/4/base
2025-12-04T09:33:41.6771907Z  * [new branch]              gh/lw/4/head                -> origin/gh/lw/4/head
2025-12-04T09:33:41.6773229Z  * [new branch]              gh/lw/4/orig                -> origin/gh/lw/4/orig
2025-12-04T09:33:41.6774920Z  * [new branch]              gh/lw/5/base                -> origin/gh/lw/5/base
2025-12-04T09:33:41.6776288Z  * [new branch]              gh/lw/5/head                -> origin/gh/lw/5/head
2025-12-04T09:33:41.6778051Z  * [new branch]              gh/lw/5/orig                -> origin/gh/lw/5/orig
2025-12-04T09:33:41.6779820Z  * [new branch]              gh/lw/6/base                -> origin/gh/lw/6/base
2025-12-04T09:33:41.6781182Z  * [new branch]              gh/lw/6/head                -> origin/gh/lw/6/head
2025-12-04T09:33:41.6782398Z  * [new branch]              gh/lw/6/orig                -> origin/gh/lw/6/orig
2025-12-04T09:33:41.6784391Z  * [new branch]              gh/malfet/14/base           -> origin/gh/malfet/14/base
2025-12-04T09:33:41.6786078Z  * [new branch]              gh/malfet/417/base          -> origin/gh/malfet/417/base
2025-12-04T09:33:41.6787567Z  * [new branch]              gh/malfet/417/head          -> origin/gh/malfet/417/head
2025-12-04T09:33:41.6788913Z  * [new branch]              gh/malfet/417/orig          -> origin/gh/malfet/417/orig
2025-12-04T09:33:41.6790549Z  * [new branch]              gh/malfet/506/base          -> origin/gh/malfet/506/base
2025-12-04T09:33:41.6791856Z  * [new branch]              gh/malfet/506/head          -> origin/gh/malfet/506/head
2025-12-04T09:33:41.6793118Z  * [new branch]              gh/malfet/506/orig          -> origin/gh/malfet/506/orig
2025-12-04T09:33:41.6794835Z  * [new branch]              gh/malfet/517/base          -> origin/gh/malfet/517/base
2025-12-04T09:33:41.6796213Z  * [new branch]              gh/malfet/517/head          -> origin/gh/malfet/517/head
2025-12-04T09:33:41.6798672Z  * [new branch]              gh/malfet/528/base          -> origin/gh/malfet/528/base
2025-12-04T09:33:41.6800017Z  * [new branch]              gh/malfet/528/head          -> origin/gh/malfet/528/head
2025-12-04T09:33:41.6801349Z  * [new branch]              gh/malfet/528/orig          -> origin/gh/malfet/528/orig
2025-12-04T09:33:41.6806146Z  * [new branch]              gh/malfet/537/base          -> origin/gh/malfet/537/base
2025-12-04T09:33:41.6807406Z  * [new branch]              gh/malfet/537/head          -> origin/gh/malfet/537/head
2025-12-04T09:33:41.6808853Z  * [new branch]              gh/malfet/537/orig          -> origin/gh/malfet/537/orig
2025-12-04T09:33:41.6810657Z  * [new branch]              gh/malfet/546/base          -> origin/gh/malfet/546/base
2025-12-04T09:33:41.6812584Z  * [new branch]              gh/malfet/546/head          -> origin/gh/malfet/546/head
2025-12-04T09:33:41.6813783Z  * [new branch]              gh/malfet/546/orig          -> origin/gh/malfet/546/orig
2025-12-04T09:33:41.6815441Z  * [new branch]              gh/malfet/565/base          -> origin/gh/malfet/565/base
2025-12-04T09:33:41.6816798Z  * [new branch]              gh/malfet/565/head          -> origin/gh/malfet/565/head
2025-12-04T09:33:41.6818166Z  * [new branch]              gh/malfet/565/orig          -> origin/gh/malfet/565/orig
2025-12-04T09:33:41.6820218Z  * [new branch]              gh/malfet/575/base          -> origin/gh/malfet/575/base
2025-12-04T09:33:41.6821593Z  * [new branch]              gh/malfet/575/head          -> origin/gh/malfet/575/head
2025-12-04T09:33:41.6822892Z  * [new branch]              gh/malfet/575/orig          -> origin/gh/malfet/575/orig
2025-12-04T09:33:41.6824597Z  * [new branch]              gh/malfet/580/base          -> origin/gh/malfet/580/base
2025-12-04T09:33:41.6825844Z  * [new branch]              gh/malfet/580/head          -> origin/gh/malfet/580/head
2025-12-04T09:33:41.6827107Z  * [new branch]              gh/malfet/580/orig          -> origin/gh/malfet/580/orig
2025-12-04T09:33:41.6828773Z  * [new branch]              gh/malfet/581/base          -> origin/gh/malfet/581/base
2025-12-04T09:33:41.6830226Z  * [new branch]              gh/malfet/581/head          -> origin/gh/malfet/581/head
2025-12-04T09:33:41.6831602Z  * [new branch]              gh/malfet/581/orig          -> origin/gh/malfet/581/orig
2025-12-04T09:33:41.6833707Z  * [new branch]              gh/malfet/583/base          -> origin/gh/malfet/583/base
2025-12-04T09:33:41.6835042Z  * [new branch]              gh/malfet/583/head          -> origin/gh/malfet/583/head
2025-12-04T09:33:41.6836399Z  * [new branch]              gh/malfet/583/orig          -> origin/gh/malfet/583/orig
2025-12-04T09:33:41.6838112Z  * [new branch]              gh/malfet/586/base          -> origin/gh/malfet/586/base
2025-12-04T09:33:41.6839469Z  * [new branch]              gh/malfet/586/head          -> origin/gh/malfet/586/head
2025-12-04T09:33:41.6840624Z  * [new branch]              gh/malfet/586/orig          -> origin/gh/malfet/586/orig
2025-12-04T09:33:41.6842379Z  * [new branch]              gh/malfet/587/base          -> origin/gh/malfet/587/base
2025-12-04T09:33:41.6843732Z  * [new branch]              gh/malfet/587/head          -> origin/gh/malfet/587/head
2025-12-04T09:33:41.6845011Z  * [new branch]              gh/malfet/587/orig          -> origin/gh/malfet/587/orig
2025-12-04T09:33:41.6846684Z  * [new branch]              gh/malfet/588/base          -> origin/gh/malfet/588/base
2025-12-04T09:33:41.6847937Z  * [new branch]              gh/malfet/588/head          -> origin/gh/malfet/588/head
2025-12-04T09:33:41.6849395Z  * [new branch]              gh/malfet/588/orig          -> origin/gh/malfet/588/orig
2025-12-04T09:33:41.6851166Z  * [new branch]              gh/malfet/589/base          -> origin/gh/malfet/589/base
2025-12-04T09:33:41.6852435Z  * [new branch]              gh/malfet/589/head          -> origin/gh/malfet/589/head
2025-12-04T09:33:41.6853820Z  * [new branch]              gh/malfet/589/orig          -> origin/gh/malfet/589/orig
2025-12-04T09:33:41.6855455Z  * [new branch]              gh/malfet/590/base          -> origin/gh/malfet/590/base
2025-12-04T09:33:41.6856723Z  * [new branch]              gh/malfet/590/head          -> origin/gh/malfet/590/head
2025-12-04T09:33:41.6858004Z  * [new branch]              gh/malfet/590/orig          -> origin/gh/malfet/590/orig
2025-12-04T09:33:41.6860193Z  * [new branch]              gh/malfet/591/base          -> origin/gh/malfet/591/base
2025-12-04T09:33:41.6861464Z  * [new branch]              gh/malfet/591/head          -> origin/gh/malfet/591/head
2025-12-04T09:33:41.6862803Z  * [new branch]              gh/malfet/591/orig          -> origin/gh/malfet/591/orig
2025-12-04T09:33:41.6864499Z  * [new branch]              gh/malfet/592/base          -> origin/gh/malfet/592/base
2025-12-04T09:33:41.6865810Z  * [new branch]              gh/malfet/592/head          -> origin/gh/malfet/592/head
2025-12-04T09:33:41.6867060Z  * [new branch]              gh/malfet/592/orig          -> origin/gh/malfet/592/orig
2025-12-04T09:33:41.6868828Z  * [new branch]              gh/malfet/593/base          -> origin/gh/malfet/593/base
2025-12-04T09:33:41.6870066Z  * [new branch]              gh/malfet/593/head          -> origin/gh/malfet/593/head
2025-12-04T09:33:41.6871462Z  * [new branch]              gh/malfet/593/orig          -> origin/gh/malfet/593/orig
2025-12-04T09:33:41.6873276Z  * [new branch]              gh/malfet/594/base          -> origin/gh/malfet/594/base
2025-12-04T09:33:41.6874552Z  * [new branch]              gh/malfet/594/head          -> origin/gh/malfet/594/head
2025-12-04T09:33:41.6876290Z  * [new branch]              gh/malfet/594/orig          -> origin/gh/malfet/594/orig
2025-12-04T09:33:41.6877936Z  * [new branch]              gh/malfet/595/base          -> origin/gh/malfet/595/base
2025-12-04T09:33:41.6879210Z  * [new branch]              gh/malfet/595/head          -> origin/gh/malfet/595/head
2025-12-04T09:33:41.6880569Z  * [new branch]              gh/malfet/595/orig          -> origin/gh/malfet/595/orig
2025-12-04T09:33:41.6882260Z  * [new branch]              gh/malfet/596/base          -> origin/gh/malfet/596/base
2025-12-04T09:33:41.6883601Z  * [new branch]              gh/malfet/596/head          -> origin/gh/malfet/596/head
2025-12-04T09:33:41.6884876Z  * [new branch]              gh/malfet/596/orig          -> origin/gh/malfet/596/orig
2025-12-04T09:33:41.6887056Z  * [new branch]              gh/malfet/597/base          -> origin/gh/malfet/597/base
2025-12-04T09:33:41.6888320Z  * [new branch]              gh/malfet/597/head          -> origin/gh/malfet/597/head
2025-12-04T09:33:41.6889704Z  * [new branch]              gh/malfet/597/orig          -> origin/gh/malfet/597/orig
2025-12-04T09:33:41.6891437Z  * [new branch]              gh/malfet/598/base          -> origin/gh/malfet/598/base
2025-12-04T09:33:41.6892743Z  * [new branch]              gh/malfet/598/head          -> origin/gh/malfet/598/head
2025-12-04T09:33:41.6893992Z  * [new branch]              gh/malfet/598/orig          -> origin/gh/malfet/598/orig
2025-12-04T09:33:41.6895700Z  * [new branch]              gh/malfet/599/base          -> origin/gh/malfet/599/base
2025-12-04T09:33:41.6896999Z  * [new branch]              gh/malfet/599/head          -> origin/gh/malfet/599/head
2025-12-04T09:33:41.6898250Z  * [new branch]              gh/malfet/599/orig          -> origin/gh/malfet/599/orig
2025-12-04T09:33:41.6899942Z  * [new branch]              gh/malfet/600/base          -> origin/gh/malfet/600/base
2025-12-04T09:33:41.6902017Z  * [new branch]              gh/malfet/600/head          -> origin/gh/malfet/600/head
2025-12-04T09:33:41.6903249Z  * [new branch]              gh/malfet/600/orig          -> origin/gh/malfet/600/orig
2025-12-04T09:33:41.6905236Z  * [new branch]              gh/malfet/601/base          -> origin/gh/malfet/601/base
2025-12-04T09:33:41.6906529Z  * [new branch]              gh/malfet/601/head          -> origin/gh/malfet/601/head
2025-12-04T09:33:41.6907883Z  * [new branch]              gh/malfet/601/orig          -> origin/gh/malfet/601/orig
2025-12-04T09:33:41.6909710Z  * [new branch]              gh/malfet/602/base          -> origin/gh/malfet/602/base
2025-12-04T09:33:41.6910951Z  * [new branch]              gh/malfet/602/head          -> origin/gh/malfet/602/head
2025-12-04T09:33:41.6912199Z  * [new branch]              gh/malfet/602/orig          -> origin/gh/malfet/602/orig
2025-12-04T09:33:41.6913858Z  * [new branch]              gh/malfet/603/base          -> origin/gh/malfet/603/base
2025-12-04T09:33:41.6915051Z  * [new branch]              gh/malfet/603/head          -> origin/gh/malfet/603/head
2025-12-04T09:33:41.6916328Z  * [new branch]              gh/malfet/603/orig          -> origin/gh/malfet/603/orig
2025-12-04T09:33:41.6918077Z  * [new branch]              gh/malfet/604/base          -> origin/gh/malfet/604/base
2025-12-04T09:33:41.6919322Z  * [new branch]              gh/malfet/604/head          -> origin/gh/malfet/604/head
2025-12-04T09:33:41.6920590Z  * [new branch]              gh/malfet/604/orig          -> origin/gh/malfet/604/orig
2025-12-04T09:33:41.6922392Z  * [new branch]              gh/malfet/605/base          -> origin/gh/malfet/605/base
2025-12-04T09:33:41.6923787Z  * [new branch]              gh/malfet/605/head          -> origin/gh/malfet/605/head
2025-12-04T09:33:41.6925272Z  * [new branch]              gh/malfet/605/orig          -> origin/gh/malfet/605/orig
2025-12-04T09:33:41.6927016Z  * [new branch]              gh/malfet/606/base          -> origin/gh/malfet/606/base
2025-12-04T09:33:41.6928388Z  * [new branch]              gh/malfet/606/head          -> origin/gh/malfet/606/head
2025-12-04T09:33:41.6929671Z  * [new branch]              gh/malfet/606/orig          -> origin/gh/malfet/606/orig
2025-12-04T09:33:41.6931396Z  * [new branch]              gh/malfet/607/base          -> origin/gh/malfet/607/base
2025-12-04T09:33:41.6932693Z  * [new branch]              gh/malfet/607/head          -> origin/gh/malfet/607/head
2025-12-04T09:33:41.6933987Z  * [new branch]              gh/malfet/607/orig          -> origin/gh/malfet/607/orig
2025-12-04T09:33:41.6935752Z  * [new branch]              gh/malfet/608/base          -> origin/gh/malfet/608/base
2025-12-04T09:33:41.6937027Z  * [new branch]              gh/malfet/608/head          -> origin/gh/malfet/608/head
2025-12-04T09:33:41.6938324Z  * [new branch]              gh/malfet/608/orig          -> origin/gh/malfet/608/orig
2025-12-04T09:33:41.6940569Z  * [new branch]              gh/malfet/609/base          -> origin/gh/malfet/609/base
2025-12-04T09:33:41.6941828Z  * [new branch]              gh/malfet/609/head          -> origin/gh/malfet/609/head
2025-12-04T09:33:41.6943258Z  * [new branch]              gh/malfet/609/orig          -> origin/gh/malfet/609/orig
2025-12-04T09:33:41.6945139Z  * [new branch]              gh/malfet/610/base          -> origin/gh/malfet/610/base
2025-12-04T09:33:41.6946350Z  * [new branch]              gh/malfet/610/head          -> origin/gh/malfet/610/head
2025-12-04T09:33:41.6947734Z  * [new branch]              gh/malfet/610/orig          -> origin/gh/malfet/610/orig
2025-12-04T09:33:41.6949443Z  * [new branch]              gh/malfet/611/base          -> origin/gh/malfet/611/base
2025-12-04T09:33:41.6950693Z  * [new branch]              gh/malfet/611/head          -> origin/gh/malfet/611/head
2025-12-04T09:33:41.6951964Z  * [new branch]              gh/malfet/611/orig          -> origin/gh/malfet/611/orig
2025-12-04T09:33:41.6953546Z  * [new branch]              gh/malfet/612/base          -> origin/gh/malfet/612/base
2025-12-04T09:33:41.6954807Z  * [new branch]              gh/malfet/612/head          -> origin/gh/malfet/612/head
2025-12-04T09:33:41.6956149Z  * [new branch]              gh/malfet/612/orig          -> origin/gh/malfet/612/orig
2025-12-04T09:33:41.6957952Z  * [new branch]              gh/malfet/64/base           -> origin/gh/malfet/64/base
2025-12-04T09:33:41.6959206Z  * [new branch]              gh/malfet/64/head           -> origin/gh/malfet/64/head
2025-12-04T09:33:41.6961545Z  * [new branch]              gh/manuelcandales/11/base   -> origin/gh/manuelcandales/11/base
2025-12-04T09:33:41.6962988Z  * [new branch]              gh/manuelcandales/11/head   -> origin/gh/manuelcandales/11/head
2025-12-04T09:33:41.6964298Z  * [new branch]              gh/manuelcandales/11/orig   -> origin/gh/manuelcandales/11/orig
2025-12-04T09:33:41.6966524Z  * [new branch]              gh/markkm/1/base            -> origin/gh/markkm/1/base
2025-12-04T09:33:41.6968628Z  * [new branch]              gh/masnesral/1/base         -> origin/gh/masnesral/1/base
2025-12-04T09:33:41.6969901Z  * [new branch]              gh/masnesral/1/head         -> origin/gh/masnesral/1/head
2025-12-04T09:33:41.6971172Z  * [new branch]              gh/masnesral/1/orig         -> origin/gh/masnesral/1/orig
2025-12-04T09:33:41.6973515Z  * [new branch]              gh/mhorowitz/0/base         -> origin/gh/mhorowitz/0/base
2025-12-04T09:33:41.6974908Z  * [new branch]              gh/mhorowitz/0/head         -> origin/gh/mhorowitz/0/head
2025-12-04T09:33:41.6976459Z  * [new branch]              gh/mhorowitz/1/base         -> origin/gh/mhorowitz/1/base
2025-12-04T09:33:41.6977758Z  * [new branch]              gh/mhorowitz/1/head         -> origin/gh/mhorowitz/1/head
2025-12-04T09:33:41.6979278Z  * [new branch]              gh/mhorowitz/2/base         -> origin/gh/mhorowitz/2/base
2025-12-04T09:33:41.6980578Z  * [new branch]              gh/mhorowitz/2/head         -> origin/gh/mhorowitz/2/head
2025-12-04T09:33:41.6982106Z  * [new branch]              gh/mhorowitz/3/base         -> origin/gh/mhorowitz/3/base
2025-12-04T09:33:41.6983333Z  * [new branch]              gh/mhorowitz/3/head         -> origin/gh/mhorowitz/3/head
2025-12-04T09:33:41.6984822Z  * [new branch]              gh/mhorowitz/4/base         -> origin/gh/mhorowitz/4/base
2025-12-04T09:33:41.6986044Z  * [new branch]              gh/mhorowitz/4/head         -> origin/gh/mhorowitz/4/head
2025-12-04T09:33:41.6987946Z  * [new branch]              gh/mhorowitz/5/base         -> origin/gh/mhorowitz/5/base
2025-12-04T09:33:41.6989161Z  * [new branch]              gh/mhorowitz/5/head         -> origin/gh/mhorowitz/5/head
2025-12-04T09:33:41.6990823Z  * [new branch]              gh/mhorowitz/6/base         -> origin/gh/mhorowitz/6/base
2025-12-04T09:33:41.6992071Z  * [new branch]              gh/mhorowitz/6/head         -> origin/gh/mhorowitz/6/head
2025-12-04T09:33:41.6994254Z  * [new branch]              gh/mikaylagawarecki/234/base -> origin/gh/mikaylagawarecki/234/base
2025-12-04T09:33:41.6995560Z  * [new branch]              gh/mikaylagawarecki/234/head -> origin/gh/mikaylagawarecki/234/head
2025-12-04T09:33:41.6997202Z  * [new branch]              gh/mikaylagawarecki/235/base -> origin/gh/mikaylagawarecki/235/base
2025-12-04T09:33:41.6998470Z  * [new branch]              gh/mikaylagawarecki/235/head -> origin/gh/mikaylagawarecki/235/head
2025-12-04T09:33:41.7000023Z  * [new branch]              gh/mikaylagawarecki/236/base -> origin/gh/mikaylagawarecki/236/base
2025-12-04T09:33:41.7001419Z  * [new branch]              gh/mikaylagawarecki/236/head -> origin/gh/mikaylagawarecki/236/head
2025-12-04T09:33:41.7003593Z  * [new branch]              gh/mikaylagawarecki/237/base -> origin/gh/mikaylagawarecki/237/base
2025-12-04T09:33:41.7004794Z  * [new branch]              gh/mikaylagawarecki/237/head -> origin/gh/mikaylagawarecki/237/head
2025-12-04T09:33:41.7006531Z  * [new branch]              gh/mikaylagawarecki/238/base -> origin/gh/mikaylagawarecki/238/base
2025-12-04T09:33:41.7007814Z  * [new branch]              gh/mikaylagawarecki/238/head -> origin/gh/mikaylagawarecki/238/head
2025-12-04T09:33:41.7009536Z  * [new branch]              gh/mikaylagawarecki/336/base -> origin/gh/mikaylagawarecki/336/base
2025-12-04T09:33:41.7010814Z  * [new branch]              gh/mikaylagawarecki/336/head -> origin/gh/mikaylagawarecki/336/head
2025-12-04T09:33:41.7012058Z  * [new branch]              gh/mikaylagawarecki/336/orig -> origin/gh/mikaylagawarecki/336/orig
2025-12-04T09:33:41.7013929Z  * [new branch]              gh/mikaylagawarecki/341/base -> origin/gh/mikaylagawarecki/341/base
2025-12-04T09:33:41.7015179Z  * [new branch]              gh/mikaylagawarecki/341/head -> origin/gh/mikaylagawarecki/341/head
2025-12-04T09:33:41.7016418Z  * [new branch]              gh/mikaylagawarecki/341/orig -> origin/gh/mikaylagawarecki/341/orig
2025-12-04T09:33:41.7018351Z  * [new branch]              gh/mikaylagawarecki/342/base -> origin/gh/mikaylagawarecki/342/base
2025-12-04T09:33:41.7019612Z  * [new branch]              gh/mikaylagawarecki/342/head -> origin/gh/mikaylagawarecki/342/head
2025-12-04T09:33:41.7020973Z  * [new branch]              gh/mikaylagawarecki/342/orig -> origin/gh/mikaylagawarecki/342/orig
2025-12-04T09:33:41.7022861Z  * [new branch]              gh/mikaylagawarecki/345/base -> origin/gh/mikaylagawarecki/345/base
2025-12-04T09:33:41.7024106Z  * [new branch]              gh/mikaylagawarecki/345/head -> origin/gh/mikaylagawarecki/345/head
2025-12-04T09:33:41.7025406Z  * [new branch]              gh/mikaylagawarecki/345/orig -> origin/gh/mikaylagawarecki/345/orig
2025-12-04T09:33:41.7027266Z  * [new branch]              gh/mikaylagawarecki/346/base -> origin/gh/mikaylagawarecki/346/base
2025-12-04T09:33:41.7028547Z  * [new branch]              gh/mikaylagawarecki/346/head -> origin/gh/mikaylagawarecki/346/head
2025-12-04T09:33:41.7029831Z  * [new branch]              gh/mikaylagawarecki/346/orig -> origin/gh/mikaylagawarecki/346/orig
2025-12-04T09:33:41.7031687Z  * [new branch]              gh/mikaylagawarecki/347/base -> origin/gh/mikaylagawarecki/347/base
2025-12-04T09:33:41.7032926Z  * [new branch]              gh/mikaylagawarecki/347/head -> origin/gh/mikaylagawarecki/347/head
2025-12-04T09:33:41.7034129Z  * [new branch]              gh/mikaylagawarecki/347/orig -> origin/gh/mikaylagawarecki/347/orig
2025-12-04T09:33:41.7035922Z  * [new branch]              gh/mikaylagawarecki/350/base -> origin/gh/mikaylagawarecki/350/base
2025-12-04T09:33:41.7037169Z  * [new branch]              gh/mikaylagawarecki/350/head -> origin/gh/mikaylagawarecki/350/head
2025-12-04T09:33:41.7038482Z  * [new branch]              gh/mikaylagawarecki/350/orig -> origin/gh/mikaylagawarecki/350/orig
2025-12-04T09:33:41.7040775Z  * [new branch]              gh/mikaylagawarecki/351/base -> origin/gh/mikaylagawarecki/351/base
2025-12-04T09:33:41.7042167Z  * [new branch]              gh/mikaylagawarecki/351/head -> origin/gh/mikaylagawarecki/351/head
2025-12-04T09:33:41.7043669Z  * [new branch]              gh/mikaylagawarecki/351/orig -> origin/gh/mikaylagawarecki/351/orig
2025-12-04T09:33:41.7045549Z  * [new branch]              gh/mikaylagawarecki/352/base -> origin/gh/mikaylagawarecki/352/base
2025-12-04T09:33:41.7047005Z  * [new branch]              gh/mikaylagawarecki/352/head -> origin/gh/mikaylagawarecki/352/head
2025-12-04T09:33:41.7048392Z  * [new branch]              gh/mikaylagawarecki/352/orig -> origin/gh/mikaylagawarecki/352/orig
2025-12-04T09:33:41.7050220Z  * [new branch]              gh/mikaylagawarecki/353/base -> origin/gh/mikaylagawarecki/353/base
2025-12-04T09:33:41.7051843Z  * [new branch]              gh/mikaylagawarecki/353/head -> origin/gh/mikaylagawarecki/353/head
2025-12-04T09:33:41.7053106Z  * [new branch]              gh/mikaylagawarecki/353/orig -> origin/gh/mikaylagawarecki/353/orig
2025-12-04T09:33:41.7054645Z  * [new branch]              gh/mikaylagawarecki/354/base -> origin/gh/mikaylagawarecki/354/base
2025-12-04T09:33:41.7055933Z  * [new branch]              gh/mikaylagawarecki/354/head -> origin/gh/mikaylagawarecki/354/head
2025-12-04T09:33:41.7057284Z  * [new branch]              gh/mikaylagawarecki/354/orig -> origin/gh/mikaylagawarecki/354/orig
2025-12-04T09:33:41.7059506Z  * [new branch]              gh/mikaylagawarecki/356/base -> origin/gh/mikaylagawarecki/356/base
2025-12-04T09:33:41.7060873Z  * [new branch]              gh/mikaylagawarecki/356/head -> origin/gh/mikaylagawarecki/356/head
2025-12-04T09:33:41.7062156Z  * [new branch]              gh/mikaylagawarecki/356/orig -> origin/gh/mikaylagawarecki/356/orig
2025-12-04T09:33:41.7064296Z  * [new branch]              gh/mikaylagawarecki/357/base -> origin/gh/mikaylagawarecki/357/base
2025-12-04T09:33:41.7065581Z  * [new branch]              gh/mikaylagawarecki/357/head -> origin/gh/mikaylagawarecki/357/head
2025-12-04T09:33:41.7066892Z  * [new branch]              gh/mikaylagawarecki/357/orig -> origin/gh/mikaylagawarecki/357/orig
2025-12-04T09:33:41.7068823Z  * [new branch]              gh/mikaylagawarecki/359/base -> origin/gh/mikaylagawarecki/359/base
2025-12-04T09:33:41.7070163Z  * [new branch]              gh/mikaylagawarecki/359/head -> origin/gh/mikaylagawarecki/359/head
2025-12-04T09:33:41.7071470Z  * [new branch]              gh/mikaylagawarecki/359/orig -> origin/gh/mikaylagawarecki/359/orig
2025-12-04T09:33:41.7073268Z  * [new branch]              gh/mikaylagawarecki/360/base -> origin/gh/mikaylagawarecki/360/base
2025-12-04T09:33:41.7074619Z  * [new branch]              gh/mikaylagawarecki/360/head -> origin/gh/mikaylagawarecki/360/head
2025-12-04T09:33:41.7075911Z  * [new branch]              gh/mikaylagawarecki/360/orig -> origin/gh/mikaylagawarecki/360/orig
2025-12-04T09:33:41.7078263Z  * [new branch]              gh/mikaylagawarecki/361/base -> origin/gh/mikaylagawarecki/361/base
2025-12-04T09:33:41.7079581Z  * [new branch]              gh/mikaylagawarecki/361/head -> origin/gh/mikaylagawarecki/361/head
2025-12-04T09:33:41.7080830Z  * [new branch]              gh/mikaylagawarecki/361/orig -> origin/gh/mikaylagawarecki/361/orig
2025-12-04T09:33:41.7082736Z  * [new branch]              gh/mikaylagawarecki/362/base -> origin/gh/mikaylagawarecki/362/base
2025-12-04T09:33:41.7084338Z  * [new branch]              gh/mikaylagawarecki/362/head -> origin/gh/mikaylagawarecki/362/head
2025-12-04T09:33:41.7085607Z  * [new branch]              gh/mikaylagawarecki/362/orig -> origin/gh/mikaylagawarecki/362/orig
2025-12-04T09:33:41.7087704Z  * [new branch]              gh/mikaylagawarecki/363/base -> origin/gh/mikaylagawarecki/363/base
2025-12-04T09:33:41.7089128Z  * [new branch]              gh/mikaylagawarecki/363/head -> origin/gh/mikaylagawarecki/363/head
2025-12-04T09:33:41.7090540Z  * [new branch]              gh/mikaylagawarecki/363/orig -> origin/gh/mikaylagawarecki/363/orig
2025-12-04T09:33:41.7093210Z  * [new branch]              gh/mikaylagawarecki/364/base -> origin/gh/mikaylagawarecki/364/base
2025-12-04T09:33:41.7094567Z  * [new branch]              gh/mikaylagawarecki/364/head -> origin/gh/mikaylagawarecki/364/head
2025-12-04T09:33:41.7095884Z  * [new branch]              gh/mikaylagawarecki/364/orig -> origin/gh/mikaylagawarecki/364/orig
2025-12-04T09:33:41.7097949Z  * [new branch]              gh/mikaylagawarecki/365/base -> origin/gh/mikaylagawarecki/365/base
2025-12-04T09:33:41.7099276Z  * [new branch]              gh/mikaylagawarecki/365/head -> origin/gh/mikaylagawarecki/365/head
2025-12-04T09:33:41.7100709Z  * [new branch]              gh/mikaylagawarecki/365/orig -> origin/gh/mikaylagawarecki/365/orig
2025-12-04T09:33:41.7102811Z  * [new branch]              gh/mikaylagawarecki/366/base -> origin/gh/mikaylagawarecki/366/base
2025-12-04T09:33:41.7104018Z  * [new branch]              gh/mikaylagawarecki/366/head -> origin/gh/mikaylagawarecki/366/head
2025-12-04T09:33:41.7105332Z  * [new branch]              gh/mikaylagawarecki/366/orig -> origin/gh/mikaylagawarecki/366/orig
2025-12-04T09:33:41.7107131Z  * [new branch]              gh/mikaylagawarecki/367/base -> origin/gh/mikaylagawarecki/367/base
2025-12-04T09:33:41.7108408Z  * [new branch]              gh/mikaylagawarecki/367/head -> origin/gh/mikaylagawarecki/367/head
2025-12-04T09:33:41.7109666Z  * [new branch]              gh/mikaylagawarecki/367/orig -> origin/gh/mikaylagawarecki/367/orig
2025-12-04T09:33:41.7111526Z  * [new branch]              gh/mikaylagawarecki/368/base -> origin/gh/mikaylagawarecki/368/base
2025-12-04T09:33:41.7112931Z  * [new branch]              gh/mikaylagawarecki/368/head -> origin/gh/mikaylagawarecki/368/head
2025-12-04T09:33:41.7114324Z  * [new branch]              gh/mikaylagawarecki/368/orig -> origin/gh/mikaylagawarecki/368/orig
2025-12-04T09:33:41.7116207Z  * [new branch]              gh/mikaylagawarecki/369/base -> origin/gh/mikaylagawarecki/369/base
2025-12-04T09:33:41.7117560Z  * [new branch]              gh/mikaylagawarecki/369/head -> origin/gh/mikaylagawarecki/369/head
2025-12-04T09:33:41.7118832Z  * [new branch]              gh/mikaylagawarecki/369/orig -> origin/gh/mikaylagawarecki/369/orig
2025-12-04T09:33:41.7120647Z  * [new branch]              gh/mikaylagawarecki/370/base -> origin/gh/mikaylagawarecki/370/base
2025-12-04T09:33:41.7121972Z  * [new branch]              gh/mikaylagawarecki/370/head -> origin/gh/mikaylagawarecki/370/head
2025-12-04T09:33:41.7123354Z  * [new branch]              gh/mikaylagawarecki/370/orig -> origin/gh/mikaylagawarecki/370/orig
2025-12-04T09:33:41.7125153Z  * [new branch]              gh/mikaylagawarecki/371/base -> origin/gh/mikaylagawarecki/371/base
2025-12-04T09:33:41.7126401Z  * [new branch]              gh/mikaylagawarecki/371/head -> origin/gh/mikaylagawarecki/371/head
2025-12-04T09:33:41.7127619Z  * [new branch]              gh/mikaylagawarecki/371/orig -> origin/gh/mikaylagawarecki/371/orig
2025-12-04T09:33:41.7129499Z  * [new branch]              gh/mikaylagawarecki/372/base -> origin/gh/mikaylagawarecki/372/base
2025-12-04T09:33:41.7130764Z  * [new branch]              gh/mikaylagawarecki/372/head -> origin/gh/mikaylagawarecki/372/head
2025-12-04T09:33:41.7132029Z  * [new branch]              gh/mikaylagawarecki/372/orig -> origin/gh/mikaylagawarecki/372/orig
2025-12-04T09:33:41.7133799Z  * [new branch]              gh/mikaylagawarecki/373/base -> origin/gh/mikaylagawarecki/373/base
2025-12-04T09:33:41.7135119Z  * [new branch]              gh/mikaylagawarecki/373/head -> origin/gh/mikaylagawarecki/373/head
2025-12-04T09:33:41.7136409Z  * [new branch]              gh/mikaylagawarecki/373/orig -> origin/gh/mikaylagawarecki/373/orig
2025-12-04T09:33:41.7138142Z  * [new branch]              gh/mikaylagawarecki/374/base -> origin/gh/mikaylagawarecki/374/base
2025-12-04T09:33:41.7139429Z  * [new branch]              gh/mikaylagawarecki/374/head -> origin/gh/mikaylagawarecki/374/head
2025-12-04T09:33:41.7140747Z  * [new branch]              gh/mikaylagawarecki/374/orig -> origin/gh/mikaylagawarecki/374/orig
2025-12-04T09:33:41.7142449Z  * [new branch]              gh/mikaylagawarecki/375/base -> origin/gh/mikaylagawarecki/375/base
2025-12-04T09:33:41.7143816Z  * [new branch]              gh/mikaylagawarecki/375/head -> origin/gh/mikaylagawarecki/375/head
2025-12-04T09:33:41.7145094Z  * [new branch]              gh/mikaylagawarecki/375/orig -> origin/gh/mikaylagawarecki/375/orig
2025-12-04T09:33:41.7146950Z  * [new branch]              gh/mikaylagawarecki/376/base -> origin/gh/mikaylagawarecki/376/base
2025-12-04T09:33:41.7148375Z  * [new branch]              gh/mikaylagawarecki/376/head -> origin/gh/mikaylagawarecki/376/head
2025-12-04T09:33:41.7149609Z  * [new branch]              gh/mikaylagawarecki/376/orig -> origin/gh/mikaylagawarecki/376/orig
2025-12-04T09:33:41.7151448Z  * [new branch]              gh/mikaylagawarecki/377/base -> origin/gh/mikaylagawarecki/377/base
2025-12-04T09:33:41.7152827Z  * [new branch]              gh/mikaylagawarecki/377/head -> origin/gh/mikaylagawarecki/377/head
2025-12-04T09:33:41.7154212Z  * [new branch]              gh/mikaylagawarecki/377/orig -> origin/gh/mikaylagawarecki/377/orig
2025-12-04T09:33:41.7155939Z  * [new branch]              gh/mikaylagawarecki/378/base -> origin/gh/mikaylagawarecki/378/base
2025-12-04T09:33:41.7157291Z  * [new branch]              gh/mikaylagawarecki/378/head -> origin/gh/mikaylagawarecki/378/head
2025-12-04T09:33:41.7158598Z  * [new branch]              gh/mikaylagawarecki/378/orig -> origin/gh/mikaylagawarecki/378/orig
2025-12-04T09:33:41.7160349Z  * [new branch]              gh/mikaylagawarecki/379/base -> origin/gh/mikaylagawarecki/379/base
2025-12-04T09:33:41.7161619Z  * [new branch]              gh/mikaylagawarecki/379/head -> origin/gh/mikaylagawarecki/379/head
2025-12-04T09:33:41.7163047Z  * [new branch]              gh/mikaylagawarecki/379/orig -> origin/gh/mikaylagawarecki/379/orig
2025-12-04T09:33:41.7164665Z  * [new branch]              gh/mikaylagawarecki/380/base -> origin/gh/mikaylagawarecki/380/base
2025-12-04T09:33:41.7165914Z  * [new branch]              gh/mikaylagawarecki/380/head -> origin/gh/mikaylagawarecki/380/head
2025-12-04T09:33:41.7167173Z  * [new branch]              gh/mikaylagawarecki/380/orig -> origin/gh/mikaylagawarecki/380/orig
2025-12-04T09:33:41.7168839Z  * [new branch]              gh/mikaylagawarecki/381/base -> origin/gh/mikaylagawarecki/381/base
2025-12-04T09:33:41.7170141Z  * [new branch]              gh/mikaylagawarecki/381/head -> origin/gh/mikaylagawarecki/381/head
2025-12-04T09:33:41.7171399Z  * [new branch]              gh/mikaylagawarecki/381/orig -> origin/gh/mikaylagawarecki/381/orig
2025-12-04T09:33:41.7172950Z  * [new branch]              gh/mikaylagawarecki/382/base -> origin/gh/mikaylagawarecki/382/base
2025-12-04T09:33:41.7174259Z  * [new branch]              gh/mikaylagawarecki/382/head -> origin/gh/mikaylagawarecki/382/head
2025-12-04T09:33:41.7175487Z  * [new branch]              gh/mikaylagawarecki/382/orig -> origin/gh/mikaylagawarecki/382/orig
2025-12-04T09:33:41.7177337Z  * [new branch]              gh/mikaylagawarecki/383/base -> origin/gh/mikaylagawarecki/383/base
2025-12-04T09:33:41.7178693Z  * [new branch]              gh/mikaylagawarecki/383/head -> origin/gh/mikaylagawarecki/383/head
2025-12-04T09:33:41.7179983Z  * [new branch]              gh/mikaylagawarecki/383/orig -> origin/gh/mikaylagawarecki/383/orig
2025-12-04T09:33:41.7181684Z  * [new branch]              gh/mikaylagawarecki/384/base -> origin/gh/mikaylagawarecki/384/base
2025-12-04T09:33:41.7182967Z  * [new branch]              gh/mikaylagawarecki/384/head -> origin/gh/mikaylagawarecki/384/head
2025-12-04T09:33:41.7184232Z  * [new branch]              gh/mikaylagawarecki/384/orig -> origin/gh/mikaylagawarecki/384/orig
2025-12-04T09:33:41.7186053Z  * [new branch]              gh/mikaylagawarecki/385/base -> origin/gh/mikaylagawarecki/385/base
2025-12-04T09:33:41.7187430Z  * [new branch]              gh/mikaylagawarecki/385/head -> origin/gh/mikaylagawarecki/385/head
2025-12-04T09:33:41.7188705Z  * [new branch]              gh/mikaylagawarecki/385/orig -> origin/gh/mikaylagawarecki/385/orig
2025-12-04T09:33:41.7190618Z  * [new branch]              gh/mikaylagawarecki/386/base -> origin/gh/mikaylagawarecki/386/base
2025-12-04T09:33:41.7191834Z  * [new branch]              gh/mikaylagawarecki/386/head -> origin/gh/mikaylagawarecki/386/head
2025-12-04T09:33:41.7193330Z  * [new branch]              gh/mikaylagawarecki/386/orig -> origin/gh/mikaylagawarecki/386/orig
2025-12-04T09:33:41.7195301Z  * [new branch]              gh/mikaylagawarecki/387/base -> origin/gh/mikaylagawarecki/387/base
2025-12-04T09:33:41.7197027Z  * [new branch]              gh/mikaylagawarecki/387/head -> origin/gh/mikaylagawarecki/387/head
2025-12-04T09:33:41.7198405Z  * [new branch]              gh/mikaylagawarecki/387/orig -> origin/gh/mikaylagawarecki/387/orig
2025-12-04T09:33:41.7199952Z  * [new branch]              gh/mikaylagawarecki/388/base -> origin/gh/mikaylagawarecki/388/base
2025-12-04T09:33:41.7204082Z  * [new branch]              gh/mikaylagawarecki/388/head -> origin/gh/mikaylagawarecki/388/head
2025-12-04T09:33:41.7205904Z  * [new branch]              gh/mikaylagawarecki/388/orig -> origin/gh/mikaylagawarecki/388/orig
2025-12-04T09:33:41.7207791Z  * [new branch]              gh/mikaylagawarecki/389/base -> origin/gh/mikaylagawarecki/389/base
2025-12-04T09:33:41.7209070Z  * [new branch]              gh/mikaylagawarecki/389/head -> origin/gh/mikaylagawarecki/389/head
2025-12-04T09:33:41.7210347Z  * [new branch]              gh/mikaylagawarecki/389/orig -> origin/gh/mikaylagawarecki/389/orig
2025-12-04T09:33:41.7212753Z  * [new branch]              gh/mikaylagawarecki/390/base -> origin/gh/mikaylagawarecki/390/base
2025-12-04T09:33:41.7214011Z  * [new branch]              gh/mikaylagawarecki/390/head -> origin/gh/mikaylagawarecki/390/head
2025-12-04T09:33:41.7215268Z  * [new branch]              gh/mikaylagawarecki/390/orig -> origin/gh/mikaylagawarecki/390/orig
2025-12-04T09:33:41.7217677Z  * [new branch]              gh/mikaylagawarecki/391/base -> origin/gh/mikaylagawarecki/391/base
2025-12-04T09:33:41.7219096Z  * [new branch]              gh/mikaylagawarecki/391/head -> origin/gh/mikaylagawarecki/391/head
2025-12-04T09:33:41.7220414Z  * [new branch]              gh/mikaylagawarecki/391/orig -> origin/gh/mikaylagawarecki/391/orig
2025-12-04T09:33:41.7222373Z  * [new branch]              gh/mikaylagawarecki/392/base -> origin/gh/mikaylagawarecki/392/base
2025-12-04T09:33:41.7223701Z  * [new branch]              gh/mikaylagawarecki/392/head -> origin/gh/mikaylagawarecki/392/head
2025-12-04T09:33:41.7224998Z  * [new branch]              gh/mikaylagawarecki/392/orig -> origin/gh/mikaylagawarecki/392/orig
2025-12-04T09:33:41.7227010Z  * [new branch]              gh/mlazos/41/base           -> origin/gh/mlazos/41/base
2025-12-04T09:33:41.7228296Z  * [new branch]              gh/mlazos/41/head           -> origin/gh/mlazos/41/head
2025-12-04T09:33:41.7229592Z  * [new branch]              gh/mlazos/41/orig           -> origin/gh/mlazos/41/orig
2025-12-04T09:33:41.7231392Z  * [new branch]              gh/mlazos/42/base           -> origin/gh/mlazos/42/base
2025-12-04T09:33:41.7232713Z  * [new branch]              gh/mlazos/42/head           -> origin/gh/mlazos/42/head
2025-12-04T09:33:41.7233988Z  * [new branch]              gh/mlazos/42/orig           -> origin/gh/mlazos/42/orig
2025-12-04T09:33:41.7235495Z  * [new branch]              gh/mlazos/43/base           -> origin/gh/mlazos/43/base
2025-12-04T09:33:41.7236876Z  * [new branch]              gh/mlazos/43/head           -> origin/gh/mlazos/43/head
2025-12-04T09:33:41.7238133Z  * [new branch]              gh/mlazos/43/orig           -> origin/gh/mlazos/43/orig
2025-12-04T09:33:41.7239673Z  * [new branch]              gh/mlazos/44/base           -> origin/gh/mlazos/44/base
2025-12-04T09:33:41.7240949Z  * [new branch]              gh/mlazos/44/head           -> origin/gh/mlazos/44/head
2025-12-04T09:33:41.7242282Z  * [new branch]              gh/mlazos/44/orig           -> origin/gh/mlazos/44/orig
2025-12-04T09:33:41.7244039Z  * [new branch]              gh/mlazos/47/base           -> origin/gh/mlazos/47/base
2025-12-04T09:33:41.7245301Z  * [new branch]              gh/mlazos/47/head           -> origin/gh/mlazos/47/head
2025-12-04T09:33:41.7246546Z  * [new branch]              gh/mlazos/47/orig           -> origin/gh/mlazos/47/orig
2025-12-04T09:33:41.7248135Z  * [new branch]              gh/mlazos/48/base           -> origin/gh/mlazos/48/base
2025-12-04T09:33:41.7249589Z  * [new branch]              gh/mlazos/48/head           -> origin/gh/mlazos/48/head
2025-12-04T09:33:41.7250783Z  * [new branch]              gh/mlazos/48/orig           -> origin/gh/mlazos/48/orig
2025-12-04T09:33:41.7252439Z  * [new branch]              gh/mlazos/49/base           -> origin/gh/mlazos/49/base
2025-12-04T09:33:41.7253703Z  * [new branch]              gh/mlazos/49/head           -> origin/gh/mlazos/49/head
2025-12-04T09:33:41.7255685Z  * [new branch]              gh/mlazos/49/orig           -> origin/gh/mlazos/49/orig
2025-12-04T09:33:41.7257162Z  * [new branch]              gh/mlazos/50/base           -> origin/gh/mlazos/50/base
2025-12-04T09:33:41.7258405Z  * [new branch]              gh/mlazos/50/head           -> origin/gh/mlazos/50/head
2025-12-04T09:33:41.7259659Z  * [new branch]              gh/mlazos/50/orig           -> origin/gh/mlazos/50/orig
2025-12-04T09:33:41.7261429Z  * [new branch]              gh/mlazos/51/base           -> origin/gh/mlazos/51/base
2025-12-04T09:33:41.7262456Z  * [new branch]              gh/mlazos/51/head           -> origin/gh/mlazos/51/head
2025-12-04T09:33:41.7263707Z  * [new branch]              gh/mlazos/51/orig           -> origin/gh/mlazos/51/orig
2025-12-04T09:33:41.7265389Z  * [new branch]              gh/mlazos/52/base           -> origin/gh/mlazos/52/base
2025-12-04T09:33:41.7266710Z  * [new branch]              gh/mlazos/52/head           -> origin/gh/mlazos/52/head
2025-12-04T09:33:41.7267996Z  * [new branch]              gh/mlazos/52/orig           -> origin/gh/mlazos/52/orig
2025-12-04T09:33:41.7269692Z  * [new branch]              gh/mlazos/53/base           -> origin/gh/mlazos/53/base
2025-12-04T09:33:41.7271029Z  * [new branch]              gh/mlazos/53/head           -> origin/gh/mlazos/53/head
2025-12-04T09:33:41.7272243Z  * [new branch]              gh/mlazos/53/orig           -> origin/gh/mlazos/53/orig
2025-12-04T09:33:41.7274470Z  * [new branch]              gh/mlazos/54/base           -> origin/gh/mlazos/54/base
2025-12-04T09:33:41.7275743Z  * [new branch]              gh/mlazos/54/head           -> origin/gh/mlazos/54/head
2025-12-04T09:33:41.7277031Z  * [new branch]              gh/mlazos/54/orig           -> origin/gh/mlazos/54/orig
2025-12-04T09:33:41.7278628Z  * [new branch]              gh/mlazos/55/base           -> origin/gh/mlazos/55/base
2025-12-04T09:33:41.7279928Z  * [new branch]              gh/mlazos/55/head           -> origin/gh/mlazos/55/head
2025-12-04T09:33:41.7281205Z  * [new branch]              gh/mlazos/55/orig           -> origin/gh/mlazos/55/orig
2025-12-04T09:33:41.7283060Z  * [new branch]              gh/mlazos/56/base           -> origin/gh/mlazos/56/base
2025-12-04T09:33:41.7284415Z  * [new branch]              gh/mlazos/56/head           -> origin/gh/mlazos/56/head
2025-12-04T09:33:41.7285654Z  * [new branch]              gh/mlazos/56/orig           -> origin/gh/mlazos/56/orig
2025-12-04T09:33:41.7287313Z  * [new branch]              gh/mlazos/57/base           -> origin/gh/mlazos/57/base
2025-12-04T09:33:41.7288584Z  * [new branch]              gh/mlazos/57/head           -> origin/gh/mlazos/57/head
2025-12-04T09:33:41.7289759Z  * [new branch]              gh/mlazos/57/orig           -> origin/gh/mlazos/57/orig
2025-12-04T09:33:41.7291496Z  * [new branch]              gh/mlazos/58/base           -> origin/gh/mlazos/58/base
2025-12-04T09:33:41.7292796Z  * [new branch]              gh/mlazos/58/head           -> origin/gh/mlazos/58/head
2025-12-04T09:33:41.7294077Z  * [new branch]              gh/mlazos/58/orig           -> origin/gh/mlazos/58/orig
2025-12-04T09:33:41.7295792Z  * [new branch]              gh/mlazos/59/base           -> origin/gh/mlazos/59/base
2025-12-04T09:33:41.7297057Z  * [new branch]              gh/mlazos/59/head           -> origin/gh/mlazos/59/head
2025-12-04T09:33:41.7298264Z  * [new branch]              gh/mlazos/59/orig           -> origin/gh/mlazos/59/orig
2025-12-04T09:33:41.7301649Z  * [new branch]              gh/mlazos/60/base           -> origin/gh/mlazos/60/base
2025-12-04T09:33:41.7302746Z  * [new branch]              gh/mlazos/60/head           -> origin/gh/mlazos/60/head
2025-12-04T09:33:41.7303197Z  * [new branch]              gh/mlazos/60/orig           -> origin/gh/mlazos/60/orig
2025-12-04T09:33:41.7305412Z  * [new branch]              gh/mlazos/61/base           -> origin/gh/mlazos/61/base
2025-12-04T09:33:41.7306682Z  * [new branch]              gh/mlazos/61/head           -> origin/gh/mlazos/61/head
2025-12-04T09:33:41.7308020Z  * [new branch]              gh/mlazos/61/orig           -> origin/gh/mlazos/61/orig
2025-12-04T09:33:41.7309753Z  * [new branch]              gh/mlazos/62/base           -> origin/gh/mlazos/62/base
2025-12-04T09:33:41.7311021Z  * [new branch]              gh/mlazos/62/head           -> origin/gh/mlazos/62/head
2025-12-04T09:33:41.7312869Z  * [new branch]              gh/mlazos/62/orig           -> origin/gh/mlazos/62/orig
2025-12-04T09:33:41.7314690Z  * [new branch]              gh/mlazos/63/base           -> origin/gh/mlazos/63/base
2025-12-04T09:33:41.7316031Z  * [new branch]              gh/mlazos/63/head           -> origin/gh/mlazos/63/head
2025-12-04T09:33:41.7317314Z  * [new branch]              gh/mlazos/63/orig           -> origin/gh/mlazos/63/orig
2025-12-04T09:33:41.7319036Z  * [new branch]              gh/mlazos/64/base           -> origin/gh/mlazos/64/base
2025-12-04T09:33:41.7320497Z  * [new branch]              gh/mlazos/64/head           -> origin/gh/mlazos/64/head
2025-12-04T09:33:41.7321731Z  * [new branch]              gh/mlazos/64/orig           -> origin/gh/mlazos/64/orig
2025-12-04T09:33:41.7323611Z  * [new branch]              gh/mlazos/65/base           -> origin/gh/mlazos/65/base
2025-12-04T09:33:41.7324864Z  * [new branch]              gh/mlazos/65/head           -> origin/gh/mlazos/65/head
2025-12-04T09:33:41.7326122Z  * [new branch]              gh/mlazos/65/orig           -> origin/gh/mlazos/65/orig
2025-12-04T09:33:41.7327861Z  * [new branch]              gh/mlazos/66/base           -> origin/gh/mlazos/66/base
2025-12-04T09:33:41.7329116Z  * [new branch]              gh/mlazos/66/head           -> origin/gh/mlazos/66/head
2025-12-04T09:33:41.7330386Z  * [new branch]              gh/mlazos/66/orig           -> origin/gh/mlazos/66/orig
2025-12-04T09:33:41.7332062Z  * [new branch]              gh/mlazos/67/base           -> origin/gh/mlazos/67/base
2025-12-04T09:33:41.7333388Z  * [new branch]              gh/mlazos/67/head           -> origin/gh/mlazos/67/head
2025-12-04T09:33:41.7334602Z  * [new branch]              gh/mlazos/67/orig           -> origin/gh/mlazos/67/orig
2025-12-04T09:33:41.7336316Z  * [new branch]              gh/mlazos/68/base           -> origin/gh/mlazos/68/base
2025-12-04T09:33:41.7337673Z  * [new branch]              gh/mlazos/68/head           -> origin/gh/mlazos/68/head
2025-12-04T09:33:41.7338963Z  * [new branch]              gh/mlazos/68/orig           -> origin/gh/mlazos/68/orig
2025-12-04T09:33:41.7340697Z  * [new branch]              gh/mlazos/69/base           -> origin/gh/mlazos/69/base
2025-12-04T09:33:41.7341966Z  * [new branch]              gh/mlazos/69/head           -> origin/gh/mlazos/69/head
2025-12-04T09:33:41.7343204Z  * [new branch]              gh/mlazos/69/orig           -> origin/gh/mlazos/69/orig
2025-12-04T09:33:41.7353059Z  * [new branch]              gh/mlazos/70/base           -> origin/gh/mlazos/70/base
2025-12-04T09:33:41.7353377Z  * [new branch]              gh/mlazos/70/head           -> origin/gh/mlazos/70/head
2025-12-04T09:33:41.7353632Z  * [new branch]              gh/mlazos/70/orig           -> origin/gh/mlazos/70/orig
2025-12-04T09:33:41.7353868Z  * [new branch]              gh/mlazos/71/base           -> origin/gh/mlazos/71/base
2025-12-04T09:33:41.7354101Z  * [new branch]              gh/mlazos/71/head           -> origin/gh/mlazos/71/head
2025-12-04T09:33:41.7354350Z  * [new branch]              gh/mlazos/71/orig           -> origin/gh/mlazos/71/orig
2025-12-04T09:33:41.7354583Z  * [new branch]              gh/mlazos/72/base           -> origin/gh/mlazos/72/base
2025-12-04T09:33:41.7354955Z  * [new branch]              gh/mlazos/72/head           -> origin/gh/mlazos/72/head
2025-12-04T09:33:41.7356042Z  * [new branch]              gh/mlazos/72/orig           -> origin/gh/mlazos/72/orig
2025-12-04T09:33:41.7357802Z  * [new branch]              gh/mlazos/73/base           -> origin/gh/mlazos/73/base
2025-12-04T09:33:41.7359100Z  * [new branch]              gh/mlazos/73/head           -> origin/gh/mlazos/73/head
2025-12-04T09:33:41.7360347Z  * [new branch]              gh/mlazos/73/orig           -> origin/gh/mlazos/73/orig
2025-12-04T09:33:41.7362508Z  * [new branch]              gh/mrmiywj/1/base           -> origin/gh/mrmiywj/1/base
2025-12-04T09:33:41.7363958Z  * [new branch]              gh/mrmiywj/1/head           -> origin/gh/mrmiywj/1/head
2025-12-04T09:33:41.7366065Z  * [new branch]              gh/muchulee8/73/base        -> origin/gh/muchulee8/73/base
2025-12-04T09:33:41.7367542Z  * [new branch]              gh/muchulee8/73/head        -> origin/gh/muchulee8/73/head
2025-12-04T09:33:41.7369356Z  * [new branch]              gh/muchulee8/73/orig        -> origin/gh/muchulee8/73/orig
2025-12-04T09:33:41.7371640Z  * [new branch]              gh/naveenthangudu/1/base    -> origin/gh/naveenthangudu/1/base
2025-12-04T09:33:41.7372939Z  * [new branch]              gh/naveenthangudu/1/head    -> origin/gh/naveenthangudu/1/head
2025-12-04T09:33:41.7374372Z  * [new branch]              gh/naveenthangudu/1/orig    -> origin/gh/naveenthangudu/1/orig
2025-12-04T09:33:41.7376026Z  * [new branch]              gh/naveenthangudu/2/base    -> origin/gh/naveenthangudu/2/base
2025-12-04T09:33:41.7377345Z  * [new branch]              gh/naveenthangudu/2/head    -> origin/gh/naveenthangudu/2/head
2025-12-04T09:33:41.7378668Z  * [new branch]              gh/naveenthangudu/2/orig    -> origin/gh/naveenthangudu/2/orig
2025-12-04T09:33:41.7380258Z  * [new branch]              gh/naveenthangudu/3/base    -> origin/gh/naveenthangudu/3/base
2025-12-04T09:33:41.7381541Z  * [new branch]              gh/naveenthangudu/3/head    -> origin/gh/naveenthangudu/3/head
2025-12-04T09:33:41.7382863Z  * [new branch]              gh/naveenthangudu/3/orig    -> origin/gh/naveenthangudu/3/orig
2025-12-04T09:33:41.7384556Z  * [new branch]              gh/naveenthangudu/4/base    -> origin/gh/naveenthangudu/4/base
2025-12-04T09:33:41.7385790Z  * [new branch]              gh/naveenthangudu/4/head    -> origin/gh/naveenthangudu/4/head
2025-12-04T09:33:41.7387311Z  * [new branch]              gh/naveenthangudu/4/orig    -> origin/gh/naveenthangudu/4/orig
2025-12-04T09:33:41.7389095Z  * [new branch]              gh/naveenthangudu/5/base    -> origin/gh/naveenthangudu/5/base
2025-12-04T09:33:41.7390384Z  * [new branch]              gh/naveenthangudu/5/head    -> origin/gh/naveenthangudu/5/head
2025-12-04T09:33:41.7391870Z  * [new branch]              gh/naveenthangudu/5/orig    -> origin/gh/naveenthangudu/5/orig
2025-12-04T09:33:41.7393543Z  * [new branch]              gh/naveenthangudu/6/base    -> origin/gh/naveenthangudu/6/base
2025-12-04T09:33:41.7394863Z  * [new branch]              gh/naveenthangudu/6/head    -> origin/gh/naveenthangudu/6/head
2025-12-04T09:33:41.7396061Z  * [new branch]              gh/naveenthangudu/6/orig    -> origin/gh/naveenthangudu/6/orig
2025-12-04T09:33:41.7397728Z  * [new branch]              gh/naveenthangudu/7/base    -> origin/gh/naveenthangudu/7/base
2025-12-04T09:33:41.7399006Z  * [new branch]              gh/naveenthangudu/7/head    -> origin/gh/naveenthangudu/7/head
2025-12-04T09:33:41.7400196Z  * [new branch]              gh/naveenthangudu/7/orig    -> origin/gh/naveenthangudu/7/orig
2025-12-04T09:33:41.7402074Z  * [new branch]              gh/naveenthangudu/8/base    -> origin/gh/naveenthangudu/8/base
2025-12-04T09:33:41.7403523Z  * [new branch]              gh/naveenthangudu/8/head    -> origin/gh/naveenthangudu/8/head
2025-12-04T09:33:41.7405229Z  * [new branch]              gh/naveenthangudu/8/orig    -> origin/gh/naveenthangudu/8/orig
2025-12-04T09:33:41.7407187Z  * [new branch]              gh/naveenthangudu/9/base    -> origin/gh/naveenthangudu/9/base
2025-12-04T09:33:41.7408346Z  * [new branch]              gh/naveenthangudu/9/head    -> origin/gh/naveenthangudu/9/head
2025-12-04T09:33:41.7409657Z  * [new branch]              gh/naveenthangudu/9/orig    -> origin/gh/naveenthangudu/9/orig
2025-12-04T09:33:41.7411612Z  * [new branch]              gh/nikitaved/1/base         -> origin/gh/nikitaved/1/base
2025-12-04T09:33:41.7412957Z  * [new branch]              gh/nikitaved/1/head         -> origin/gh/nikitaved/1/head
2025-12-04T09:33:41.7414214Z  * [new branch]              gh/nikitaved/1/orig         -> origin/gh/nikitaved/1/orig
2025-12-04T09:33:41.7415982Z  * [new branch]              gh/nikitaved/10/base        -> origin/gh/nikitaved/10/base
2025-12-04T09:33:41.7417253Z  * [new branch]              gh/nikitaved/10/head        -> origin/gh/nikitaved/10/head
2025-12-04T09:33:41.7418499Z  * [new branch]              gh/nikitaved/10/orig        -> origin/gh/nikitaved/10/orig
2025-12-04T09:33:41.7420098Z  * [new branch]              gh/nikitaved/11/base        -> origin/gh/nikitaved/11/base
2025-12-04T09:33:41.7421451Z  * [new branch]              gh/nikitaved/11/head        -> origin/gh/nikitaved/11/head
2025-12-04T09:33:41.7422803Z  * [new branch]              gh/nikitaved/11/orig        -> origin/gh/nikitaved/11/orig
2025-12-04T09:33:41.7424962Z  * [new branch]              gh/nikitaved/12/base        -> origin/gh/nikitaved/12/base
2025-12-04T09:33:41.7426265Z  * [new branch]              gh/nikitaved/12/head        -> origin/gh/nikitaved/12/head
2025-12-04T09:33:41.7427530Z  * [new branch]              gh/nikitaved/12/orig        -> origin/gh/nikitaved/12/orig
2025-12-04T09:33:41.7429219Z  * [new branch]              gh/nikitaved/13/base        -> origin/gh/nikitaved/13/base
2025-12-04T09:33:41.7430550Z  * [new branch]              gh/nikitaved/13/head        -> origin/gh/nikitaved/13/head
2025-12-04T09:33:41.7431838Z  * [new branch]              gh/nikitaved/13/orig        -> origin/gh/nikitaved/13/orig
2025-12-04T09:33:41.7433627Z  * [new branch]              gh/nikitaved/14/base        -> origin/gh/nikitaved/14/base
2025-12-04T09:33:41.7434870Z  * [new branch]              gh/nikitaved/14/head        -> origin/gh/nikitaved/14/head
2025-12-04T09:33:41.7436125Z  * [new branch]              gh/nikitaved/14/orig        -> origin/gh/nikitaved/14/orig
2025-12-04T09:33:41.7437680Z  * [new branch]              gh/nikitaved/15/base        -> origin/gh/nikitaved/15/base
2025-12-04T09:33:41.7438959Z  * [new branch]              gh/nikitaved/15/head        -> origin/gh/nikitaved/15/head
2025-12-04T09:33:41.7440316Z  * [new branch]              gh/nikitaved/15/orig        -> origin/gh/nikitaved/15/orig
2025-12-04T09:33:41.7442002Z  * [new branch]              gh/nikitaved/16/base        -> origin/gh/nikitaved/16/base
2025-12-04T09:33:41.7443408Z  * [new branch]              gh/nikitaved/16/head        -> origin/gh/nikitaved/16/head
2025-12-04T09:33:41.7444645Z  * [new branch]              gh/nikitaved/16/orig        -> origin/gh/nikitaved/16/orig
2025-12-04T09:33:41.7446408Z  * [new branch]              gh/nikitaved/2/base         -> origin/gh/nikitaved/2/base
2025-12-04T09:33:41.7447680Z  * [new branch]              gh/nikitaved/2/head         -> origin/gh/nikitaved/2/head
2025-12-04T09:33:41.7448927Z  * [new branch]              gh/nikitaved/2/orig         -> origin/gh/nikitaved/2/orig
2025-12-04T09:33:41.7450595Z  * [new branch]              gh/nikitaved/4/base         -> origin/gh/nikitaved/4/base
2025-12-04T09:33:41.7451866Z  * [new branch]              gh/nikitaved/4/head         -> origin/gh/nikitaved/4/head
2025-12-04T09:33:41.7453157Z  * [new branch]              gh/nikitaved/4/orig         -> origin/gh/nikitaved/4/orig
2025-12-04T09:33:41.7454861Z  * [new branch]              gh/nikitaved/5/base         -> origin/gh/nikitaved/5/base
2025-12-04T09:33:41.7456197Z  * [new branch]              gh/nikitaved/5/head         -> origin/gh/nikitaved/5/head
2025-12-04T09:33:41.7457666Z  * [new branch]              gh/nikitaved/5/orig         -> origin/gh/nikitaved/5/orig
2025-12-04T09:33:41.7459235Z  * [new branch]              gh/nikitaved/6/base         -> origin/gh/nikitaved/6/base
2025-12-04T09:33:41.7460555Z  * [new branch]              gh/nikitaved/6/head         -> origin/gh/nikitaved/6/head
2025-12-04T09:33:41.7461810Z  * [new branch]              gh/nikitaved/6/orig         -> origin/gh/nikitaved/6/orig
2025-12-04T09:33:41.7463491Z  * [new branch]              gh/nikitaved/8/base         -> origin/gh/nikitaved/8/base
2025-12-04T09:33:41.7464751Z  * [new branch]              gh/nikitaved/8/head         -> origin/gh/nikitaved/8/head
2025-12-04T09:33:41.7466022Z  * [new branch]              gh/nikitaved/8/orig         -> origin/gh/nikitaved/8/orig
2025-12-04T09:33:41.7468235Z  * [new branch]              gh/nikitaved/9/base         -> origin/gh/nikitaved/9/base
2025-12-04T09:33:41.7469514Z  * [new branch]              gh/nikitaved/9/head         -> origin/gh/nikitaved/9/head
2025-12-04T09:33:41.7470779Z  * [new branch]              gh/nikitaved/9/orig         -> origin/gh/nikitaved/9/orig
2025-12-04T09:33:41.7472751Z  * [new branch]              gh/oulgen/10/base           -> origin/gh/oulgen/10/base
2025-12-04T09:33:41.7474122Z  * [new branch]              gh/oulgen/10/head           -> origin/gh/oulgen/10/head
2025-12-04T09:33:41.7475392Z  * [new branch]              gh/oulgen/10/orig           -> origin/gh/oulgen/10/orig
2025-12-04T09:33:41.7477045Z  * [new branch]              gh/oulgen/11/base           -> origin/gh/oulgen/11/base
2025-12-04T09:33:41.7478314Z  * [new branch]              gh/oulgen/11/head           -> origin/gh/oulgen/11/head
2025-12-04T09:33:41.7479580Z  * [new branch]              gh/oulgen/11/orig           -> origin/gh/oulgen/11/orig
2025-12-04T09:33:41.7481226Z  * [new branch]              gh/oulgen/12/base           -> origin/gh/oulgen/12/base
2025-12-04T09:33:41.7482494Z  * [new branch]              gh/oulgen/12/head           -> origin/gh/oulgen/12/head
2025-12-04T09:33:41.7483836Z  * [new branch]              gh/oulgen/12/orig           -> origin/gh/oulgen/12/orig
2025-12-04T09:33:41.7485456Z  * [new branch]              gh/oulgen/13/base           -> origin/gh/oulgen/13/base
2025-12-04T09:33:41.7486691Z  * [new branch]              gh/oulgen/13/head           -> origin/gh/oulgen/13/head
2025-12-04T09:33:41.7487930Z  * [new branch]              gh/oulgen/13/orig           -> origin/gh/oulgen/13/orig
2025-12-04T09:33:41.7489672Z  * [new branch]              gh/oulgen/14/base           -> origin/gh/oulgen/14/base
2025-12-04T09:33:41.7491054Z  * [new branch]              gh/oulgen/14/head           -> origin/gh/oulgen/14/head
2025-12-04T09:33:41.7492332Z  * [new branch]              gh/oulgen/14/orig           -> origin/gh/oulgen/14/orig
2025-12-04T09:33:41.7494033Z  * [new branch]              gh/oulgen/15/base           -> origin/gh/oulgen/15/base
2025-12-04T09:33:41.7495283Z  * [new branch]              gh/oulgen/15/head           -> origin/gh/oulgen/15/head
2025-12-04T09:33:41.7497237Z  * [new branch]              gh/oulgen/15/orig           -> origin/gh/oulgen/15/orig
2025-12-04T09:33:41.7498641Z  * [new branch]              gh/oulgen/16/base           -> origin/gh/oulgen/16/base
2025-12-04T09:33:41.7499843Z  * [new branch]              gh/oulgen/16/head           -> origin/gh/oulgen/16/head
2025-12-04T09:33:41.7501316Z  * [new branch]              gh/oulgen/16/orig           -> origin/gh/oulgen/16/orig
2025-12-04T09:33:41.7503052Z  * [new branch]              gh/oulgen/17/base           -> origin/gh/oulgen/17/base
2025-12-04T09:33:41.7504314Z  * [new branch]              gh/oulgen/17/head           -> origin/gh/oulgen/17/head
2025-12-04T09:33:41.7505635Z  * [new branch]              gh/oulgen/17/orig           -> origin/gh/oulgen/17/orig
2025-12-04T09:33:41.7507412Z  * [new branch]              gh/oulgen/18/base           -> origin/gh/oulgen/18/base
2025-12-04T09:33:41.7508753Z  * [new branch]              gh/oulgen/18/head           -> origin/gh/oulgen/18/head
2025-12-04T09:33:41.7510162Z  * [new branch]              gh/oulgen/18/orig           -> origin/gh/oulgen/18/orig
2025-12-04T09:33:41.7511661Z  * [new branch]              gh/oulgen/19/base           -> origin/gh/oulgen/19/base
2025-12-04T09:33:41.7512945Z  * [new branch]              gh/oulgen/19/head           -> origin/gh/oulgen/19/head
2025-12-04T09:33:41.7514183Z  * [new branch]              gh/oulgen/19/orig           -> origin/gh/oulgen/19/orig
2025-12-04T09:33:41.7515916Z  * [new branch]              gh/oulgen/20/base           -> origin/gh/oulgen/20/base
2025-12-04T09:33:41.7517167Z  * [new branch]              gh/oulgen/20/head           -> origin/gh/oulgen/20/head
2025-12-04T09:33:41.7518482Z  * [new branch]              gh/oulgen/20/orig           -> origin/gh/oulgen/20/orig
2025-12-04T09:33:41.7520404Z  * [new branch]              gh/oulgen/21/base           -> origin/gh/oulgen/21/base
2025-12-04T09:33:41.7521246Z  * [new branch]              gh/oulgen/21/head           -> origin/gh/oulgen/21/head
2025-12-04T09:33:41.7523153Z  * [new branch]              gh/oulgen/21/orig           -> origin/gh/oulgen/21/orig
2025-12-04T09:33:41.7524865Z  * [new branch]              gh/oulgen/22/base           -> origin/gh/oulgen/22/base
2025-12-04T09:33:41.7526199Z  * [new branch]              gh/oulgen/22/head           -> origin/gh/oulgen/22/head
2025-12-04T09:33:41.7527459Z  * [new branch]              gh/oulgen/22/orig           -> origin/gh/oulgen/22/orig
2025-12-04T09:33:41.7529138Z  * [new branch]              gh/oulgen/23/base           -> origin/gh/oulgen/23/base
2025-12-04T09:33:41.7530359Z  * [new branch]              gh/oulgen/23/head           -> origin/gh/oulgen/23/head
2025-12-04T09:33:41.7531609Z  * [new branch]              gh/oulgen/23/orig           -> origin/gh/oulgen/23/orig
2025-12-04T09:33:41.7533202Z  * [new branch]              gh/oulgen/24/base           -> origin/gh/oulgen/24/base
2025-12-04T09:33:41.7534470Z  * [new branch]              gh/oulgen/24/head           -> origin/gh/oulgen/24/head
2025-12-04T09:33:41.7535739Z  * [new branch]              gh/oulgen/24/orig           -> origin/gh/oulgen/24/orig
2025-12-04T09:33:41.7537374Z  * [new branch]              gh/oulgen/25/base           -> origin/gh/oulgen/25/base
2025-12-04T09:33:41.7538651Z  * [new branch]              gh/oulgen/25/head           -> origin/gh/oulgen/25/head
2025-12-04T09:33:41.7539914Z  * [new branch]              gh/oulgen/25/orig           -> origin/gh/oulgen/25/orig
2025-12-04T09:33:41.7541575Z  * [new branch]              gh/oulgen/26/base           -> origin/gh/oulgen/26/base
2025-12-04T09:33:41.7542924Z  * [new branch]              gh/oulgen/26/head           -> origin/gh/oulgen/26/head
2025-12-04T09:33:41.7544272Z  * [new branch]              gh/oulgen/26/orig           -> origin/gh/oulgen/26/orig
2025-12-04T09:33:41.7545991Z  * [new branch]              gh/oulgen/4/base            -> origin/gh/oulgen/4/base
2025-12-04T09:33:41.7547237Z  * [new branch]              gh/oulgen/4/head            -> origin/gh/oulgen/4/head
2025-12-04T09:33:41.7548487Z  * [new branch]              gh/oulgen/4/orig            -> origin/gh/oulgen/4/orig
2025-12-04T09:33:41.7550578Z  * [new branch]              gh/oulgen/7/base            -> origin/gh/oulgen/7/base
2025-12-04T09:33:41.7551867Z  * [new branch]              gh/oulgen/7/head            -> origin/gh/oulgen/7/head
2025-12-04T09:33:41.7553119Z  * [new branch]              gh/oulgen/7/orig            -> origin/gh/oulgen/7/orig
2025-12-04T09:33:41.7554877Z  * [new branch]              gh/oulgen/8/base            -> origin/gh/oulgen/8/base
2025-12-04T09:33:41.7556176Z  * [new branch]              gh/oulgen/8/head            -> origin/gh/oulgen/8/head
2025-12-04T09:33:41.7557409Z  * [new branch]              gh/oulgen/8/orig            -> origin/gh/oulgen/8/orig
2025-12-04T09:33:41.7559031Z  * [new branch]              gh/oulgen/9/base            -> origin/gh/oulgen/9/base
2025-12-04T09:33:41.7560391Z  * [new branch]              gh/oulgen/9/head            -> origin/gh/oulgen/9/head
2025-12-04T09:33:41.7561732Z  * [new branch]              gh/oulgen/9/orig            -> origin/gh/oulgen/9/orig
2025-12-04T09:33:41.7563579Z  * [new branch]              gh/patvig/mtia-serialization -> origin/gh/patvig/mtia-serialization
2025-12-04T09:33:41.7565711Z  * [new branch]              gh/pearu/108/base           -> origin/gh/pearu/108/base
2025-12-04T09:33:41.7567076Z  * [new branch]              gh/pearu/108/head           -> origin/gh/pearu/108/head
2025-12-04T09:33:41.7568501Z  * [new branch]              gh/pearu/108/orig           -> origin/gh/pearu/108/orig
2025-12-04T09:33:41.7570151Z  * [new branch]              gh/pearu/109/base           -> origin/gh/pearu/109/base
2025-12-04T09:33:41.7571444Z  * [new branch]              gh/pearu/109/head           -> origin/gh/pearu/109/head
2025-12-04T09:33:41.7572735Z  * [new branch]              gh/pearu/109/orig           -> origin/gh/pearu/109/orig
2025-12-04T09:33:41.7574496Z  * [new branch]              gh/pearu/110/base           -> origin/gh/pearu/110/base
2025-12-04T09:33:41.7575830Z  * [new branch]              gh/pearu/110/head           -> origin/gh/pearu/110/head
2025-12-04T09:33:41.7577223Z  * [new branch]              gh/pearu/110/orig           -> origin/gh/pearu/110/orig
2025-12-04T09:33:41.7578902Z  * [new branch]              gh/pearu/111/base           -> origin/gh/pearu/111/base
2025-12-04T09:33:41.7580171Z  * [new branch]              gh/pearu/111/head           -> origin/gh/pearu/111/head
2025-12-04T09:33:41.7581472Z  * [new branch]              gh/pearu/111/orig           -> origin/gh/pearu/111/orig
2025-12-04T09:33:41.7583215Z  * [new branch]              gh/pearu/112/base           -> origin/gh/pearu/112/base
2025-12-04T09:33:41.7584555Z  * [new branch]              gh/pearu/112/head           -> origin/gh/pearu/112/head
2025-12-04T09:33:41.7585873Z  * [new branch]              gh/pearu/112/orig           -> origin/gh/pearu/112/orig
2025-12-04T09:33:41.7587463Z  * [new branch]              gh/pearu/115/base           -> origin/gh/pearu/115/base
2025-12-04T09:33:41.7588727Z  * [new branch]              gh/pearu/115/head           -> origin/gh/pearu/115/head
2025-12-04T09:33:41.7589999Z  * [new branch]              gh/pearu/115/orig           -> origin/gh/pearu/115/orig
2025-12-04T09:33:41.7591589Z  * [new branch]              gh/pearu/116/base           -> origin/gh/pearu/116/base
2025-12-04T09:33:41.7592822Z  * [new branch]              gh/pearu/116/head           -> origin/gh/pearu/116/head
2025-12-04T09:33:41.7594204Z  * [new branch]              gh/pearu/116/orig           -> origin/gh/pearu/116/orig
2025-12-04T09:33:41.7595902Z  * [new branch]              gh/pearu/117/base           -> origin/gh/pearu/117/base
2025-12-04T09:33:41.7597170Z  * [new branch]              gh/pearu/117/head           -> origin/gh/pearu/117/head
2025-12-04T09:33:41.7598469Z  * [new branch]              gh/pearu/117/orig           -> origin/gh/pearu/117/orig
2025-12-04T09:33:41.7600182Z  * [new branch]              gh/pearu/118/base           -> origin/gh/pearu/118/base
2025-12-04T09:33:41.7601694Z  * [new branch]              gh/pearu/118/head           -> origin/gh/pearu/118/head
2025-12-04T09:33:41.7603463Z  * [new branch]              gh/pearu/118/orig           -> origin/gh/pearu/118/orig
2025-12-04T09:33:41.7605447Z  * [new branch]              gh/pearu/119/base           -> origin/gh/pearu/119/base
2025-12-04T09:33:41.7606699Z  * [new branch]              gh/pearu/119/head           -> origin/gh/pearu/119/head
2025-12-04T09:33:41.7608493Z  * [new branch]              gh/pearu/119/orig           -> origin/gh/pearu/119/orig
2025-12-04T09:33:41.7610201Z  * [new branch]              gh/pearu/139/base           -> origin/gh/pearu/139/base
2025-12-04T09:33:41.7611456Z  * [new branch]              gh/pearu/139/head           -> origin/gh/pearu/139/head
2025-12-04T09:33:41.7613310Z  * [new branch]              gh/pearu/139/orig           -> origin/gh/pearu/139/orig
2025-12-04T09:33:41.7615039Z  * [new branch]              gh/pearu/140/base           -> origin/gh/pearu/140/base
2025-12-04T09:33:41.7616430Z  * [new branch]              gh/pearu/140/head           -> origin/gh/pearu/140/head
2025-12-04T09:33:41.7617620Z  * [new branch]              gh/pearu/140/orig           -> origin/gh/pearu/140/orig
2025-12-04T09:33:41.7619318Z  * [new branch]              gh/pearu/142/base           -> origin/gh/pearu/142/base
2025-12-04T09:33:41.7620664Z  * [new branch]              gh/pearu/142/head           -> origin/gh/pearu/142/head
2025-12-04T09:33:41.7621921Z  * [new branch]              gh/pearu/142/orig           -> origin/gh/pearu/142/orig
2025-12-04T09:33:41.7623598Z  * [new branch]              gh/pearu/143/base           -> origin/gh/pearu/143/base
2025-12-04T09:33:41.7624865Z  * [new branch]              gh/pearu/143/head           -> origin/gh/pearu/143/head
2025-12-04T09:33:41.7626226Z  * [new branch]              gh/pearu/143/orig           -> origin/gh/pearu/143/orig
2025-12-04T09:33:41.7627899Z  * [new branch]              gh/pearu/147/base           -> origin/gh/pearu/147/base
2025-12-04T09:33:41.7629401Z  * [new branch]              gh/pearu/147/head           -> origin/gh/pearu/147/head
2025-12-04T09:33:41.7630784Z  * [new branch]              gh/pearu/147/orig           -> origin/gh/pearu/147/orig
2025-12-04T09:33:41.7632492Z  * [new branch]              gh/pearu/149/base           -> origin/gh/pearu/149/base
2025-12-04T09:33:41.7634250Z  * [new branch]              gh/pearu/149/head           -> origin/gh/pearu/149/head
2025-12-04T09:33:41.7635499Z  * [new branch]              gh/pearu/149/orig           -> origin/gh/pearu/149/orig
2025-12-04T09:33:41.7637671Z  * [new branch]              gh/pearu/150/base           -> origin/gh/pearu/150/base
2025-12-04T09:33:41.7638971Z  * [new branch]              gh/pearu/150/head           -> origin/gh/pearu/150/head
2025-12-04T09:33:41.7640209Z  * [new branch]              gh/pearu/150/orig           -> origin/gh/pearu/150/orig
2025-12-04T09:33:41.7641997Z  * [new branch]              gh/pearu/151/base           -> origin/gh/pearu/151/base
2025-12-04T09:33:41.7643450Z  * [new branch]              gh/pearu/151/head           -> origin/gh/pearu/151/head
2025-12-04T09:33:41.7644793Z  * [new branch]              gh/pearu/151/orig           -> origin/gh/pearu/151/orig
2025-12-04T09:33:41.7646978Z  * [new branch]              gh/pearu/152/base           -> origin/gh/pearu/152/base
2025-12-04T09:33:41.7648376Z  * [new branch]              gh/pearu/152/head           -> origin/gh/pearu/152/head
2025-12-04T09:33:41.7649632Z  * [new branch]              gh/pearu/152/orig           -> origin/gh/pearu/152/orig
2025-12-04T09:33:41.7651398Z  * [new branch]              gh/pearu/153/base           -> origin/gh/pearu/153/base
2025-12-04T09:33:41.7652647Z  * [new branch]              gh/pearu/153/head           -> origin/gh/pearu/153/head
2025-12-04T09:33:41.7653911Z  * [new branch]              gh/pearu/153/orig           -> origin/gh/pearu/153/orig
2025-12-04T09:33:41.7656103Z  * [new branch]              gh/pearu/154/base           -> origin/gh/pearu/154/base
2025-12-04T09:33:41.7657366Z  * [new branch]              gh/pearu/154/head           -> origin/gh/pearu/154/head
2025-12-04T09:33:41.7658662Z  * [new branch]              gh/pearu/154/orig           -> origin/gh/pearu/154/orig
2025-12-04T09:33:41.7660386Z  * [new branch]              gh/pearu/155/base           -> origin/gh/pearu/155/base
2025-12-04T09:33:41.7661697Z  * [new branch]              gh/pearu/155/head           -> origin/gh/pearu/155/head
2025-12-04T09:33:41.7662944Z  * [new branch]              gh/pearu/155/orig           -> origin/gh/pearu/155/orig
2025-12-04T09:33:41.7664701Z  * [new branch]              gh/pearu/156/base           -> origin/gh/pearu/156/base
2025-12-04T09:33:41.7666030Z  * [new branch]              gh/pearu/156/head           -> origin/gh/pearu/156/head
2025-12-04T09:33:41.7667381Z  * [new branch]              gh/pearu/156/orig           -> origin/gh/pearu/156/orig
2025-12-04T09:33:41.7669616Z  * [new branch]              gh/pearu/56/base            -> origin/gh/pearu/56/base
2025-12-04T09:33:41.7671173Z  * [new branch]              gh/pearu/56/head            -> origin/gh/pearu/56/head
2025-12-04T09:33:41.7672421Z  * [new branch]              gh/pearu/56/orig            -> origin/gh/pearu/56/orig
2025-12-04T09:33:41.7674351Z  * [new branch]              gh/pearu/97/base            -> origin/gh/pearu/97/base
2025-12-04T09:33:41.7675645Z  * [new branch]              gh/pearu/97/head            -> origin/gh/pearu/97/head
2025-12-04T09:33:41.7677015Z  * [new branch]              gh/pearu/97/orig            -> origin/gh/pearu/97/orig
2025-12-04T09:33:41.7679014Z  * [new branch]              gh/pianpwk/21/base          -> origin/gh/pianpwk/21/base
2025-12-04T09:33:41.7680287Z  * [new branch]              gh/pianpwk/21/head          -> origin/gh/pianpwk/21/head
2025-12-04T09:33:41.7682020Z  * [new branch]              gh/pianpwk/28/base          -> origin/gh/pianpwk/28/base
2025-12-04T09:33:41.7683444Z  * [new branch]              gh/pianpwk/28/head          -> origin/gh/pianpwk/28/head
2025-12-04T09:33:41.7684861Z  * [new branch]              gh/pianpwk/28/orig          -> origin/gh/pianpwk/28/orig
2025-12-04T09:33:41.7686549Z  * [new branch]              gh/pianpwk/29/base          -> origin/gh/pianpwk/29/base
2025-12-04T09:33:41.7688047Z  * [new branch]              gh/pianpwk/29/head          -> origin/gh/pianpwk/29/head
2025-12-04T09:33:41.7689298Z  * [new branch]              gh/pianpwk/29/orig          -> origin/gh/pianpwk/29/orig
2025-12-04T09:33:41.7691162Z  * [new branch]              gh/pianpwk/30/base          -> origin/gh/pianpwk/30/base
2025-12-04T09:33:41.7692436Z  * [new branch]              gh/pianpwk/30/head          -> origin/gh/pianpwk/30/head
2025-12-04T09:33:41.7693760Z  * [new branch]              gh/pianpwk/30/orig          -> origin/gh/pianpwk/30/orig
2025-12-04T09:33:41.7695634Z  * [new branch]              gh/pianpwk/31/base          -> origin/gh/pianpwk/31/base
2025-12-04T09:33:41.7696891Z  * [new branch]              gh/pianpwk/31/head          -> origin/gh/pianpwk/31/head
2025-12-04T09:33:41.7698175Z  * [new branch]              gh/pianpwk/31/orig          -> origin/gh/pianpwk/31/orig
2025-12-04T09:33:41.7699694Z  * [new branch]              gh/pianpwk/32/base          -> origin/gh/pianpwk/32/base
2025-12-04T09:33:41.7701128Z  * [new branch]              gh/pianpwk/32/head          -> origin/gh/pianpwk/32/head
2025-12-04T09:33:41.7702626Z  * [new branch]              gh/pianpwk/32/orig          -> origin/gh/pianpwk/32/orig
2025-12-04T09:33:41.7704139Z  * [new branch]              gh/pianpwk/33/base          -> origin/gh/pianpwk/33/base
2025-12-04T09:33:41.7705410Z  * [new branch]              gh/pianpwk/33/head          -> origin/gh/pianpwk/33/head
2025-12-04T09:33:41.7706660Z  * [new branch]              gh/pianpwk/33/orig          -> origin/gh/pianpwk/33/orig
2025-12-04T09:33:41.7708723Z  * [new branch]              gh/pianpwk/34/base          -> origin/gh/pianpwk/34/base
2025-12-04T09:33:41.7710285Z  * [new branch]              gh/pianpwk/34/head          -> origin/gh/pianpwk/34/head
2025-12-04T09:33:41.7711909Z  * [new branch]              gh/pianpwk/34/orig          -> origin/gh/pianpwk/34/orig
2025-12-04T09:33:41.7713607Z  * [new branch]              gh/pianpwk/35/base          -> origin/gh/pianpwk/35/base
2025-12-04T09:33:41.7714932Z  * [new branch]              gh/pianpwk/35/head          -> origin/gh/pianpwk/35/head
2025-12-04T09:33:41.7716317Z  * [new branch]              gh/pianpwk/35/orig          -> origin/gh/pianpwk/35/orig
2025-12-04T09:33:41.7718294Z  * [new branch]              gh/rec/141/base             -> origin/gh/rec/141/base
2025-12-04T09:33:41.7719670Z  * [new branch]              gh/rec/141/head             -> origin/gh/rec/141/head
2025-12-04T09:33:41.7721363Z  * [new branch]              gh/rec/153/base             -> origin/gh/rec/153/base
2025-12-04T09:33:41.7722707Z  * [new branch]              gh/rec/153/head             -> origin/gh/rec/153/head
2025-12-04T09:33:41.7724026Z  * [new branch]              gh/rec/153/orig             -> origin/gh/rec/153/orig
2025-12-04T09:33:41.7726237Z  * [new branch]              gh/rec/154/base             -> origin/gh/rec/154/base
2025-12-04T09:33:41.7727416Z  * [new branch]              gh/rec/154/head             -> origin/gh/rec/154/head
2025-12-04T09:33:41.7728680Z  * [new branch]              gh/rec/154/orig             -> origin/gh/rec/154/orig
2025-12-04T09:33:41.7730334Z  * [new branch]              gh/rec/164/base             -> origin/gh/rec/164/base
2025-12-04T09:33:41.7731579Z  * [new branch]              gh/rec/164/head             -> origin/gh/rec/164/head
2025-12-04T09:33:41.7732843Z  * [new branch]              gh/rec/164/orig             -> origin/gh/rec/164/orig
2025-12-04T09:33:41.7734525Z  * [new branch]              gh/rec/166/base             -> origin/gh/rec/166/base
2025-12-04T09:33:41.7735735Z  * [new branch]              gh/rec/166/head             -> origin/gh/rec/166/head
2025-12-04T09:33:41.7737131Z  * [new branch]              gh/rec/166/orig             -> origin/gh/rec/166/orig
2025-12-04T09:33:41.7738820Z  * [new branch]              gh/rec/167/base             -> origin/gh/rec/167/base
2025-12-04T09:33:41.7740114Z  * [new branch]              gh/rec/167/head             -> origin/gh/rec/167/head
2025-12-04T09:33:41.7741356Z  * [new branch]              gh/rec/167/orig             -> origin/gh/rec/167/orig
2025-12-04T09:33:41.7743059Z  * [new branch]              gh/rec/168/base             -> origin/gh/rec/168/base
2025-12-04T09:33:41.7744312Z  * [new branch]              gh/rec/168/head             -> origin/gh/rec/168/head
2025-12-04T09:33:41.7745586Z  * [new branch]              gh/rec/168/orig             -> origin/gh/rec/168/orig
2025-12-04T09:33:41.7747228Z  * [new branch]              gh/rec/169/base             -> origin/gh/rec/169/base
2025-12-04T09:33:41.7748448Z  * [new branch]              gh/rec/169/head             -> origin/gh/rec/169/head
2025-12-04T09:33:41.7749730Z  * [new branch]              gh/rec/169/orig             -> origin/gh/rec/169/orig
2025-12-04T09:33:41.7751930Z  * [new branch]              gh/rec/170/base             -> origin/gh/rec/170/base
2025-12-04T09:33:41.7753188Z  * [new branch]              gh/rec/170/head             -> origin/gh/rec/170/head
2025-12-04T09:33:41.7754653Z  * [new branch]              gh/rec/170/orig             -> origin/gh/rec/170/orig
2025-12-04T09:33:41.7756305Z  * [new branch]              gh/rec/171/base             -> origin/gh/rec/171/base
2025-12-04T09:33:41.7757570Z  * [new branch]              gh/rec/171/head             -> origin/gh/rec/171/head
2025-12-04T09:33:41.7758876Z  * [new branch]              gh/rec/171/orig             -> origin/gh/rec/171/orig
2025-12-04T09:33:41.7760485Z  * [new branch]              gh/rec/172/base             -> origin/gh/rec/172/base
2025-12-04T09:33:41.7761757Z  * [new branch]              gh/rec/172/head             -> origin/gh/rec/172/head
2025-12-04T09:33:41.7763061Z  * [new branch]              gh/rec/172/orig             -> origin/gh/rec/172/orig
2025-12-04T09:33:41.7764835Z  * [new branch]              gh/rec/173/base             -> origin/gh/rec/173/base
2025-12-04T09:33:41.7766058Z  * [new branch]              gh/rec/173/head             -> origin/gh/rec/173/head
2025-12-04T09:33:41.7767402Z  * [new branch]              gh/rec/173/orig             -> origin/gh/rec/173/orig
2025-12-04T09:33:41.7769025Z  * [new branch]              gh/rec/174/base             -> origin/gh/rec/174/base
2025-12-04T09:33:41.7770295Z  * [new branch]              gh/rec/174/head             -> origin/gh/rec/174/head
2025-12-04T09:33:41.7771675Z  * [new branch]              gh/rec/174/orig             -> origin/gh/rec/174/orig
2025-12-04T09:33:41.7773307Z  * [new branch]              gh/rec/175/base             -> origin/gh/rec/175/base
2025-12-04T09:33:41.7774558Z  * [new branch]              gh/rec/175/head             -> origin/gh/rec/175/head
2025-12-04T09:33:41.7775811Z  * [new branch]              gh/rec/175/orig             -> origin/gh/rec/175/orig
2025-12-04T09:33:41.7777513Z  * [new branch]              gh/rec/176/base             -> origin/gh/rec/176/base
2025-12-04T09:33:41.7778673Z  * [new branch]              gh/rec/176/head             -> origin/gh/rec/176/head
2025-12-04T09:33:41.7779978Z  * [new branch]              gh/rec/176/orig             -> origin/gh/rec/176/orig
2025-12-04T09:33:41.7781608Z  * [new branch]              gh/rec/177/base             -> origin/gh/rec/177/base
2025-12-04T09:33:41.7782847Z  * [new branch]              gh/rec/177/head             -> origin/gh/rec/177/head
2025-12-04T09:33:41.7784069Z  * [new branch]              gh/rec/177/orig             -> origin/gh/rec/177/orig
2025-12-04T09:33:41.7786168Z  * [new branch]              gh/robert-hardwick/3/base   -> origin/gh/robert-hardwick/3/base
2025-12-04T09:33:41.7787535Z  * [new branch]              gh/robert-hardwick/3/head   -> origin/gh/robert-hardwick/3/head
2025-12-04T09:33:41.7788817Z  * [new branch]              gh/robert-hardwick/3/orig   -> origin/gh/robert-hardwick/3/orig
2025-12-04T09:33:41.7791020Z  * [new branch]              gh/robert-hardwick/4/base   -> origin/gh/robert-hardwick/4/base
2025-12-04T09:33:41.7792274Z  * [new branch]              gh/robert-hardwick/4/head   -> origin/gh/robert-hardwick/4/head
2025-12-04T09:33:41.7793557Z  * [new branch]              gh/robert-hardwick/4/orig   -> origin/gh/robert-hardwick/4/orig
2025-12-04T09:33:41.7795235Z  * [new branch]              gh/robert-hardwick/5/base   -> origin/gh/robert-hardwick/5/base
2025-12-04T09:33:41.7796513Z  * [new branch]              gh/robert-hardwick/5/head   -> origin/gh/robert-hardwick/5/head
2025-12-04T09:33:41.7797837Z  * [new branch]              gh/robert-hardwick/5/orig   -> origin/gh/robert-hardwick/5/orig
2025-12-04T09:33:41.7799521Z  * [new branch]              gh/robert-hardwick/6/base   -> origin/gh/robert-hardwick/6/base
2025-12-04T09:33:41.7800776Z  * [new branch]              gh/robert-hardwick/6/head   -> origin/gh/robert-hardwick/6/head
2025-12-04T09:33:41.7802357Z  * [new branch]              gh/robert-hardwick/6/orig   -> origin/gh/robert-hardwick/6/orig
2025-12-04T09:33:41.7804181Z  * [new branch]              gh/robert-hardwick/7/base   -> origin/gh/robert-hardwick/7/base
2025-12-04T09:33:41.7805523Z  * [new branch]              gh/robert-hardwick/7/head   -> origin/gh/robert-hardwick/7/head
2025-12-04T09:33:41.7806813Z  * [new branch]              gh/robert-hardwick/7/orig   -> origin/gh/robert-hardwick/7/orig
2025-12-04T09:33:41.7808486Z  * [new branch]              gh/robert-hardwick/8/base   -> origin/gh/robert-hardwick/8/base
2025-12-04T09:33:41.7809728Z  * [new branch]              gh/robert-hardwick/8/head   -> origin/gh/robert-hardwick/8/head
2025-12-04T09:33:41.7811004Z  * [new branch]              gh/robert-hardwick/8/orig   -> origin/gh/robert-hardwick/8/orig
2025-12-04T09:33:41.7812675Z  * [new branch]              gh/robert-hardwick/9/base   -> origin/gh/robert-hardwick/9/base
2025-12-04T09:33:41.7813973Z  * [new branch]              gh/robert-hardwick/9/head   -> origin/gh/robert-hardwick/9/head
2025-12-04T09:33:41.7815201Z  * [new branch]              gh/robert-hardwick/9/orig   -> origin/gh/robert-hardwick/9/orig
2025-12-04T09:33:41.7817271Z  * [new branch]              gh/rtimpe/1/base            -> origin/gh/rtimpe/1/base
2025-12-04T09:33:41.7818576Z  * [new branch]              gh/rtimpe/1/head            -> origin/gh/rtimpe/1/head
2025-12-04T09:33:41.7820279Z  * [new branch]              gh/rtimpe/2/base            -> origin/gh/rtimpe/2/base
2025-12-04T09:33:41.7821565Z  * [new branch]              gh/rtimpe/2/head            -> origin/gh/rtimpe/2/head
2025-12-04T09:33:41.7823274Z  * [new branch]              gh/rtimpe/22/base           -> origin/gh/rtimpe/22/base
2025-12-04T09:33:41.7824557Z  * [new branch]              gh/rtimpe/22/head           -> origin/gh/rtimpe/22/head
2025-12-04T09:33:41.7825838Z  * [new branch]              gh/rtimpe/22/orig           -> origin/gh/rtimpe/22/orig
2025-12-04T09:33:41.7827410Z  * [new branch]              gh/rtimpe/23/base           -> origin/gh/rtimpe/23/base
2025-12-04T09:33:41.7828788Z  * [new branch]              gh/rtimpe/23/head           -> origin/gh/rtimpe/23/head
2025-12-04T09:33:41.7829946Z  * [new branch]              gh/rtimpe/23/orig           -> origin/gh/rtimpe/23/orig
2025-12-04T09:33:41.7831598Z  * [new branch]              gh/rtimpe/24/base           -> origin/gh/rtimpe/24/base
2025-12-04T09:33:41.7832863Z  * [new branch]              gh/rtimpe/24/head           -> origin/gh/rtimpe/24/head
2025-12-04T09:33:41.7834120Z  * [new branch]              gh/rtimpe/24/orig           -> origin/gh/rtimpe/24/orig
2025-12-04T09:33:41.7835763Z  * [new branch]              gh/rtimpe/25/base           -> origin/gh/rtimpe/25/base
2025-12-04T09:33:41.7837111Z  * [new branch]              gh/rtimpe/25/head           -> origin/gh/rtimpe/25/head
2025-12-04T09:33:41.7838483Z  * [new branch]              gh/rtimpe/25/orig           -> origin/gh/rtimpe/25/orig
2025-12-04T09:33:41.7840174Z  * [new branch]              gh/rtimpe/26/base           -> origin/gh/rtimpe/26/base
2025-12-04T09:33:41.7841457Z  * [new branch]              gh/rtimpe/26/head           -> origin/gh/rtimpe/26/head
2025-12-04T09:33:41.7843338Z  * [new branch]              gh/rtimpe/26/orig           -> origin/gh/rtimpe/26/orig
2025-12-04T09:33:41.7844934Z  * [new branch]              gh/rtimpe/27/base           -> origin/gh/rtimpe/27/base
2025-12-04T09:33:41.7846200Z  * [new branch]              gh/rtimpe/27/head           -> origin/gh/rtimpe/27/head
2025-12-04T09:33:41.7847856Z  * [new branch]              gh/rtimpe/27/orig           -> origin/gh/rtimpe/27/orig
2025-12-04T09:33:41.7850008Z  * [new branch]              gh/rtimpe/28/base           -> origin/gh/rtimpe/28/base
2025-12-04T09:33:41.7851230Z  * [new branch]              gh/rtimpe/28/head           -> origin/gh/rtimpe/28/head
2025-12-04T09:33:41.7852555Z  * [new branch]              gh/rtimpe/28/orig           -> origin/gh/rtimpe/28/orig
2025-12-04T09:33:41.7854263Z  * [new branch]              gh/rtimpe/29/base           -> origin/gh/rtimpe/29/base
2025-12-04T09:33:41.7855558Z  * [new branch]              gh/rtimpe/29/head           -> origin/gh/rtimpe/29/head
2025-12-04T09:33:41.7857191Z  * [new branch]              gh/rtimpe/29/orig           -> origin/gh/rtimpe/29/orig
2025-12-04T09:33:41.7858826Z  * [new branch]              gh/rtimpe/3/base            -> origin/gh/rtimpe/3/base
2025-12-04T09:33:41.7860026Z  * [new branch]              gh/rtimpe/3/head            -> origin/gh/rtimpe/3/head
2025-12-04T09:33:41.7861691Z  * [new branch]              gh/rtimpe/30/base           -> origin/gh/rtimpe/30/base
2025-12-04T09:33:41.7863417Z  * [new branch]              gh/rtimpe/30/head           -> origin/gh/rtimpe/30/head
2025-12-04T09:33:41.7864690Z  * [new branch]              gh/rtimpe/30/orig           -> origin/gh/rtimpe/30/orig
2025-12-04T09:33:41.7866371Z  * [new branch]              gh/rtimpe/31/base           -> origin/gh/rtimpe/31/base
2025-12-04T09:33:41.7867610Z  * [new branch]              gh/rtimpe/31/head           -> origin/gh/rtimpe/31/head
2025-12-04T09:33:41.7868976Z  * [new branch]              gh/rtimpe/31/orig           -> origin/gh/rtimpe/31/orig
2025-12-04T09:33:41.7870697Z  * [new branch]              gh/rtimpe/32/base           -> origin/gh/rtimpe/32/base
2025-12-04T09:33:41.7871915Z  * [new branch]              gh/rtimpe/32/head           -> origin/gh/rtimpe/32/head
2025-12-04T09:33:41.7873173Z  * [new branch]              gh/rtimpe/32/orig           -> origin/gh/rtimpe/32/orig
2025-12-04T09:33:41.7874924Z  * [new branch]              gh/rtimpe/33/base           -> origin/gh/rtimpe/33/base
2025-12-04T09:33:41.7876185Z  * [new branch]              gh/rtimpe/33/head           -> origin/gh/rtimpe/33/head
2025-12-04T09:33:41.7877443Z  * [new branch]              gh/rtimpe/33/orig           -> origin/gh/rtimpe/33/orig
2025-12-04T09:33:41.7879024Z  * [new branch]              gh/rtimpe/34/base           -> origin/gh/rtimpe/34/base
2025-12-04T09:33:41.7880307Z  * [new branch]              gh/rtimpe/34/head           -> origin/gh/rtimpe/34/head
2025-12-04T09:33:41.7881700Z  * [new branch]              gh/rtimpe/34/orig           -> origin/gh/rtimpe/34/orig
2025-12-04T09:33:41.7883547Z  * [new branch]              gh/rtimpe/35/base           -> origin/gh/rtimpe/35/base
2025-12-04T09:33:41.7884859Z  * [new branch]              gh/rtimpe/35/head           -> origin/gh/rtimpe/35/head
2025-12-04T09:33:41.7886155Z  * [new branch]              gh/rtimpe/35/orig           -> origin/gh/rtimpe/35/orig
2025-12-04T09:33:41.7887838Z  * [new branch]              gh/rtimpe/4/base            -> origin/gh/rtimpe/4/base
2025-12-04T09:33:41.7889164Z  * [new branch]              gh/rtimpe/4/head            -> origin/gh/rtimpe/4/head
2025-12-04T09:33:41.7891402Z  * [new branch]              gh/ruisizhang123/1/base     -> origin/gh/ruisizhang123/1/base
2025-12-04T09:33:41.7892698Z  * [new branch]              gh/ruisizhang123/1/head     -> origin/gh/ruisizhang123/1/head
2025-12-04T09:33:41.7893983Z  * [new branch]              gh/ruisizhang123/1/orig     -> origin/gh/ruisizhang123/1/orig
2025-12-04T09:33:41.7895669Z  * [new branch]              gh/ruisizhang123/4/base     -> origin/gh/ruisizhang123/4/base
2025-12-04T09:33:41.7896916Z  * [new branch]              gh/ruisizhang123/4/head     -> origin/gh/ruisizhang123/4/head
2025-12-04T09:33:41.7898353Z  * [new branch]              gh/ruisizhang123/4/orig     -> origin/gh/ruisizhang123/4/orig
2025-12-04T09:33:41.7900055Z  * [new branch]              gh/ruisizhang123/5/base     -> origin/gh/ruisizhang123/5/base
2025-12-04T09:33:41.7901587Z  * [new branch]              gh/ruisizhang123/5/head     -> origin/gh/ruisizhang123/5/head
2025-12-04T09:33:41.7902917Z  * [new branch]              gh/ruisizhang123/5/orig     -> origin/gh/ruisizhang123/5/orig
2025-12-04T09:33:41.7904592Z  * [new branch]              gh/ruisizhang123/6/base     -> origin/gh/ruisizhang123/6/base
2025-12-04T09:33:41.7905839Z  * [new branch]              gh/ruisizhang123/6/head     -> origin/gh/ruisizhang123/6/head
2025-12-04T09:33:41.7907142Z  * [new branch]              gh/ruisizhang123/6/orig     -> origin/gh/ruisizhang123/6/orig
2025-12-04T09:33:41.7908966Z  * [new branch]              gh/ruisizhang123/7/base     -> origin/gh/ruisizhang123/7/base
2025-12-04T09:33:41.7910290Z  * [new branch]              gh/ruisizhang123/7/head     -> origin/gh/ruisizhang123/7/head
2025-12-04T09:33:41.7911561Z  * [new branch]              gh/ruisizhang123/7/orig     -> origin/gh/ruisizhang123/7/orig
2025-12-04T09:33:41.7913131Z  * [new branch]              gh/ruisizhang123/8/base     -> origin/gh/ruisizhang123/8/base
2025-12-04T09:33:41.7914436Z  * [new branch]              gh/ruisizhang123/8/head     -> origin/gh/ruisizhang123/8/head
2025-12-04T09:33:41.7915701Z  * [new branch]              gh/ruisizhang123/8/orig     -> origin/gh/ruisizhang123/8/orig
2025-12-04T09:33:41.7917395Z  * [new branch]              gh/ruisizhang123/9/base     -> origin/gh/ruisizhang123/9/base
2025-12-04T09:33:41.7918726Z  * [new branch]              gh/ruisizhang123/9/head     -> origin/gh/ruisizhang123/9/head
2025-12-04T09:33:41.7920012Z  * [new branch]              gh/ruisizhang123/9/orig     -> origin/gh/ruisizhang123/9/orig
2025-12-04T09:33:41.7922672Z  * [new branch]              gh/seemethere/52/base       -> origin/gh/seemethere/52/base
2025-12-04T09:33:41.7923664Z  * [new branch]              gh/seemethere/52/head       -> origin/gh/seemethere/52/head
2025-12-04T09:33:41.7925161Z  * [new branch]              gh/seemethere/52/orig       -> origin/gh/seemethere/52/orig
2025-12-04T09:33:41.7926825Z  * [new branch]              gh/seemethere/53/base       -> origin/gh/seemethere/53/base
2025-12-04T09:33:41.7927978Z  * [new branch]              gh/seemethere/53/head       -> origin/gh/seemethere/53/head
2025-12-04T09:33:41.7929327Z  * [new branch]              gh/seemethere/53/orig       -> origin/gh/seemethere/53/orig
2025-12-04T09:33:41.7931109Z  * [new branch]              gh/seemethere/54/base       -> origin/gh/seemethere/54/base
2025-12-04T09:33:41.7932283Z  * [new branch]              gh/seemethere/54/head       -> origin/gh/seemethere/54/head
2025-12-04T09:33:41.7933676Z  * [new branch]              gh/seemethere/54/orig       -> origin/gh/seemethere/54/orig
2025-12-04T09:33:41.7935325Z  * [new branch]              gh/seemethere/55/base       -> origin/gh/seemethere/55/base
2025-12-04T09:33:41.7936407Z  * [new branch]              gh/seemethere/55/head       -> origin/gh/seemethere/55/head
2025-12-04T09:33:41.7937666Z  * [new branch]              gh/seemethere/55/orig       -> origin/gh/seemethere/55/orig
2025-12-04T09:33:41.7939362Z  * [new branch]              gh/seemethere/59/base       -> origin/gh/seemethere/59/base
2025-12-04T09:33:41.7940594Z  * [new branch]              gh/seemethere/59/head       -> origin/gh/seemethere/59/head
2025-12-04T09:33:41.7942085Z  * [new branch]              gh/seemethere/59/orig       -> origin/gh/seemethere/59/orig
2025-12-04T09:33:41.7943726Z  * [new branch]              gh/seemethere/62/base       -> origin/gh/seemethere/62/base
2025-12-04T09:33:41.7944896Z  * [new branch]              gh/seemethere/62/head       -> origin/gh/seemethere/62/head
2025-12-04T09:33:41.7946165Z  * [new branch]              gh/seemethere/62/orig       -> origin/gh/seemethere/62/orig
2025-12-04T09:33:41.7947906Z  * [new branch]              gh/seemethere/63/base       -> origin/gh/seemethere/63/base
2025-12-04T09:33:41.7949064Z  * [new branch]              gh/seemethere/63/head       -> origin/gh/seemethere/63/head
2025-12-04T09:33:41.7950344Z  * [new branch]              gh/seemethere/63/orig       -> origin/gh/seemethere/63/orig
2025-12-04T09:33:41.7952163Z  * [new branch]              gh/seemethere/71/base       -> origin/gh/seemethere/71/base
2025-12-04T09:33:41.7953439Z  * [new branch]              gh/seemethere/71/head       -> origin/gh/seemethere/71/head
2025-12-04T09:33:41.7954718Z  * [new branch]              gh/seemethere/71/orig       -> origin/gh/seemethere/71/orig
2025-12-04T09:33:41.7956536Z  * [new branch]              gh/seemethere/72/base       -> origin/gh/seemethere/72/base
2025-12-04T09:33:41.7957729Z  * [new branch]              gh/seemethere/72/head       -> origin/gh/seemethere/72/head
2025-12-04T09:33:41.7959201Z  * [new branch]              gh/seemethere/72/orig       -> origin/gh/seemethere/72/orig
2025-12-04T09:33:41.7960877Z  * [new branch]              gh/seemethere/73/base       -> origin/gh/seemethere/73/base
2025-12-04T09:33:41.7962021Z  * [new branch]              gh/seemethere/73/head       -> origin/gh/seemethere/73/head
2025-12-04T09:33:41.7963564Z  * [new branch]              gh/seemethere/73/orig       -> origin/gh/seemethere/73/orig
2025-12-04T09:33:41.7965234Z  * [new branch]              gh/seemethere/74/base       -> origin/gh/seemethere/74/base
2025-12-04T09:33:41.7966421Z  * [new branch]              gh/seemethere/74/head       -> origin/gh/seemethere/74/head
2025-12-04T09:33:41.7967745Z  * [new branch]              gh/seemethere/74/orig       -> origin/gh/seemethere/74/orig
2025-12-04T09:33:41.7969597Z  * [new branch]              gh/seemethere/75/base       -> origin/gh/seemethere/75/base
2025-12-04T09:33:41.7970778Z  * [new branch]              gh/seemethere/75/head       -> origin/gh/seemethere/75/head
2025-12-04T09:33:41.7972117Z  * [new branch]              gh/seemethere/75/orig       -> origin/gh/seemethere/75/orig
2025-12-04T09:33:41.7974015Z  * [new branch]              gh/seemethere/76/base       -> origin/gh/seemethere/76/base
2025-12-04T09:33:41.7975077Z  * [new branch]              gh/seemethere/76/head       -> origin/gh/seemethere/76/head
2025-12-04T09:33:41.7976552Z  * [new branch]              gh/seemethere/76/orig       -> origin/gh/seemethere/76/orig
2025-12-04T09:33:41.7978928Z  * [new branch]              gh/shunting314/145/base     -> origin/gh/shunting314/145/base
2025-12-04T09:33:41.7980252Z  * [new branch]              gh/shunting314/145/head     -> origin/gh/shunting314/145/head
2025-12-04T09:33:41.7981561Z  * [new branch]              gh/shunting314/145/orig     -> origin/gh/shunting314/145/orig
2025-12-04T09:33:41.7984263Z  * [new branch]              gh/shunting314/176/base     -> origin/gh/shunting314/176/base
2025-12-04T09:33:41.7985806Z  * [new branch]              gh/shunting314/176/head     -> origin/gh/shunting314/176/head
2025-12-04T09:33:41.7987021Z  * [new branch]              gh/shunting314/176/orig     -> origin/gh/shunting314/176/orig
2025-12-04T09:33:41.7988975Z  * [new branch]              gh/shunting314/249/base     -> origin/gh/shunting314/249/base
2025-12-04T09:33:41.7990233Z  * [new branch]              gh/shunting314/249/head     -> origin/gh/shunting314/249/head
2025-12-04T09:33:41.7991668Z  * [new branch]              gh/shunting314/249/orig     -> origin/gh/shunting314/249/orig
2025-12-04T09:33:41.7993564Z  * [new branch]              gh/shunting314/253/base     -> origin/gh/shunting314/253/base
2025-12-04T09:33:41.7994826Z  * [new branch]              gh/shunting314/253/head     -> origin/gh/shunting314/253/head
2025-12-04T09:33:41.7996033Z  * [new branch]              gh/shunting314/253/orig     -> origin/gh/shunting314/253/orig
2025-12-04T09:33:41.7997878Z  * [new branch]              gh/shunting314/256/base     -> origin/gh/shunting314/256/base
2025-12-04T09:33:41.7999076Z  * [new branch]              gh/shunting314/256/head     -> origin/gh/shunting314/256/head
2025-12-04T09:33:41.8000327Z  * [new branch]              gh/shunting314/256/orig     -> origin/gh/shunting314/256/orig
2025-12-04T09:33:41.8005058Z  * [new branch]              gh/shunting314/257/base     -> origin/gh/shunting314/257/base
2025-12-04T09:33:41.8006316Z  * [new branch]              gh/shunting314/257/head     -> origin/gh/shunting314/257/head
2025-12-04T09:33:41.8007643Z  * [new branch]              gh/shunting314/257/orig     -> origin/gh/shunting314/257/orig
2025-12-04T09:33:41.8009652Z  * [new branch]              gh/shunting314/258/base     -> origin/gh/shunting314/258/base
2025-12-04T09:33:41.8010844Z  * [new branch]              gh/shunting314/258/head     -> origin/gh/shunting314/258/head
2025-12-04T09:33:41.8012154Z  * [new branch]              gh/shunting314/258/orig     -> origin/gh/shunting314/258/orig
2025-12-04T09:33:41.8013870Z  * [new branch]              gh/shunting314/259/base     -> origin/gh/shunting314/259/base
2025-12-04T09:33:41.8015175Z  * [new branch]              gh/shunting314/259/head     -> origin/gh/shunting314/259/head
2025-12-04T09:33:41.8016432Z  * [new branch]              gh/shunting314/259/orig     -> origin/gh/shunting314/259/orig
2025-12-04T09:33:41.8018391Z  * [new branch]              gh/shunting314/260/base     -> origin/gh/shunting314/260/base
2025-12-04T09:33:41.8019720Z  * [new branch]              gh/shunting314/260/head     -> origin/gh/shunting314/260/head
2025-12-04T09:33:41.8021000Z  * [new branch]              gh/shunting314/260/orig     -> origin/gh/shunting314/260/orig
2025-12-04T09:33:41.8022963Z  * [new branch]              gh/shunting314/261/base     -> origin/gh/shunting314/261/base
2025-12-04T09:33:41.8024211Z  * [new branch]              gh/shunting314/261/head     -> origin/gh/shunting314/261/head
2025-12-04T09:33:41.8025514Z  * [new branch]              gh/shunting314/261/orig     -> origin/gh/shunting314/261/orig
2025-12-04T09:33:41.8027415Z  * [new branch]              gh/shunting314/262/base     -> origin/gh/shunting314/262/base
2025-12-04T09:33:41.8028653Z  * [new branch]              gh/shunting314/262/head     -> origin/gh/shunting314/262/head
2025-12-04T09:33:41.8029988Z  * [new branch]              gh/shunting314/262/orig     -> origin/gh/shunting314/262/orig
2025-12-04T09:33:41.8031878Z  * [new branch]              gh/shunting314/263/base     -> origin/gh/shunting314/263/base
2025-12-04T09:33:41.8033416Z  * [new branch]              gh/shunting314/263/head     -> origin/gh/shunting314/263/head
2025-12-04T09:33:41.8034625Z  * [new branch]              gh/shunting314/263/orig     -> origin/gh/shunting314/263/orig
2025-12-04T09:33:41.8036525Z  * [new branch]              gh/shunting314/264/base     -> origin/gh/shunting314/264/base
2025-12-04T09:33:41.8037859Z  * [new branch]              gh/shunting314/264/head     -> origin/gh/shunting314/264/head
2025-12-04T09:33:41.8039037Z  * [new branch]              gh/shunting314/264/orig     -> origin/gh/shunting314/264/orig
2025-12-04T09:33:41.8040911Z  * [new branch]              gh/shunting314/265/base     -> origin/gh/shunting314/265/base
2025-12-04T09:33:41.8042042Z  * [new branch]              gh/shunting314/265/head     -> origin/gh/shunting314/265/head
2025-12-04T09:33:41.8043528Z  * [new branch]              gh/shunting314/265/orig     -> origin/gh/shunting314/265/orig
2025-12-04T09:33:41.8045255Z  * [new branch]              gh/shunting314/266/base     -> origin/gh/shunting314/266/base
2025-12-04T09:33:41.8046762Z  * [new branch]              gh/shunting314/266/head     -> origin/gh/shunting314/266/head
2025-12-04T09:33:41.8048553Z  * [new branch]              gh/shunting314/266/orig     -> origin/gh/shunting314/266/orig
2025-12-04T09:33:41.8050560Z  * [new branch]              gh/shunting314/267/base     -> origin/gh/shunting314/267/base
2025-12-04T09:33:41.8052072Z  * [new branch]              gh/shunting314/267/head     -> origin/gh/shunting314/267/head
2025-12-04T09:33:41.8053297Z  * [new branch]              gh/shunting314/267/orig     -> origin/gh/shunting314/267/orig
2025-12-04T09:33:41.8055783Z  * [new branch]              gh/shunting314/268/base     -> origin/gh/shunting314/268/base
2025-12-04T09:33:41.8057028Z  * [new branch]              gh/shunting314/268/head     -> origin/gh/shunting314/268/head
2025-12-04T09:33:41.8058457Z  * [new branch]              gh/shunting314/268/orig     -> origin/gh/shunting314/268/orig
2025-12-04T09:33:41.8060300Z  * [new branch]              gh/shunting314/269/base     -> origin/gh/shunting314/269/base
2025-12-04T09:33:41.8061500Z  * [new branch]              gh/shunting314/269/head     -> origin/gh/shunting314/269/head
2025-12-04T09:33:41.8062793Z  * [new branch]              gh/shunting314/269/orig     -> origin/gh/shunting314/269/orig
2025-12-04T09:33:41.8064938Z  * [new branch]              gh/silverguo/1/base         -> origin/gh/silverguo/1/base
2025-12-04T09:33:41.8066126Z  * [new branch]              gh/silverguo/1/head         -> origin/gh/silverguo/1/head
2025-12-04T09:33:41.8067829Z  * [new branch]              gh/silverguo/2/base         -> origin/gh/silverguo/2/base
2025-12-04T09:33:41.8068987Z  * [new branch]              gh/silverguo/2/head         -> origin/gh/silverguo/2/head
2025-12-04T09:33:41.8070616Z  * [new branch]              gh/silverguo/3/base         -> origin/gh/silverguo/3/base
2025-12-04T09:33:41.8072423Z  * [new branch]              gh/silverguo/3/head         -> origin/gh/silverguo/3/head
2025-12-04T09:33:41.8074029Z  * [new branch]              gh/silverguo/4/base         -> origin/gh/silverguo/4/base
2025-12-04T09:33:41.8075239Z  * [new branch]              gh/silverguo/4/head         -> origin/gh/silverguo/4/head
2025-12-04T09:33:41.8077366Z  * [new branch]              gh/slayton58/39/base        -> origin/gh/slayton58/39/base
2025-12-04T09:33:41.8078567Z  * [new branch]              gh/slayton58/39/head        -> origin/gh/slayton58/39/head
2025-12-04T09:33:41.8080097Z  * [new branch]              gh/slayton58/39/orig        -> origin/gh/slayton58/39/orig
2025-12-04T09:33:41.8081849Z  * [new branch]              gh/slayton58/42/base        -> origin/gh/slayton58/42/base
2025-12-04T09:33:41.8083165Z  * [new branch]              gh/slayton58/42/head        -> origin/gh/slayton58/42/head
2025-12-04T09:33:41.8084672Z  * [new branch]              gh/slayton58/42/orig        -> origin/gh/slayton58/42/orig
2025-12-04T09:33:41.8086499Z  * [new branch]              gh/slayton58/43/base        -> origin/gh/slayton58/43/base
2025-12-04T09:33:41.8087674Z  * [new branch]              gh/slayton58/43/head        -> origin/gh/slayton58/43/head
2025-12-04T09:33:41.8089452Z  * [new branch]              gh/slayton58/43/orig        -> origin/gh/slayton58/43/orig
2025-12-04T09:33:41.8091345Z  * [new branch]              gh/slayton58/44/base        -> origin/gh/slayton58/44/base
2025-12-04T09:33:41.8092648Z  * [new branch]              gh/slayton58/44/head        -> origin/gh/slayton58/44/head
2025-12-04T09:33:41.8093871Z  * [new branch]              gh/slayton58/44/orig        -> origin/gh/slayton58/44/orig
2025-12-04T09:33:41.8095727Z  * [new branch]              gh/slayton58/45/base        -> origin/gh/slayton58/45/base
2025-12-04T09:33:41.8096905Z  * [new branch]              gh/slayton58/45/head        -> origin/gh/slayton58/45/head
2025-12-04T09:33:41.8098177Z  * [new branch]              gh/slayton58/45/orig        -> origin/gh/slayton58/45/orig
2025-12-04T09:33:41.8099968Z  * [new branch]              gh/slayton58/46/base        -> origin/gh/slayton58/46/base
2025-12-04T09:33:41.8101449Z  * [new branch]              gh/slayton58/46/head        -> origin/gh/slayton58/46/head
2025-12-04T09:33:41.8103484Z  * [new branch]              gh/slayton58/46/orig        -> origin/gh/slayton58/46/orig
2025-12-04T09:33:41.8105354Z  * [new branch]              gh/slayton58/6/base         -> origin/gh/slayton58/6/base
2025-12-04T09:33:41.8106588Z  * [new branch]              gh/slayton58/6/head         -> origin/gh/slayton58/6/head
2025-12-04T09:33:41.8108207Z  * [new branch]              gh/slayton58/7/base         -> origin/gh/slayton58/7/base
2025-12-04T09:33:41.8109320Z  * [new branch]              gh/slayton58/7/head         -> origin/gh/slayton58/7/head
2025-12-04T09:33:41.8111787Z  * [new branch]              gh/soulitzer/269/base       -> origin/gh/soulitzer/269/base
2025-12-04T09:33:41.8112896Z  * [new branch]              gh/soulitzer/269/head       -> origin/gh/soulitzer/269/head
2025-12-04T09:33:41.8114187Z  * [new branch]              gh/soulitzer/269/orig       -> origin/gh/soulitzer/269/orig
2025-12-04T09:33:41.8116097Z  * [new branch]              gh/soulitzer/276/base       -> origin/gh/soulitzer/276/base
2025-12-04T09:33:41.8117308Z  * [new branch]              gh/soulitzer/276/head       -> origin/gh/soulitzer/276/head
2025-12-04T09:33:41.8118574Z  * [new branch]              gh/soulitzer/276/orig       -> origin/gh/soulitzer/276/orig
2025-12-04T09:33:41.8120875Z  * [new branch]              gh/soulitzer/287/base       -> origin/gh/soulitzer/287/base
2025-12-04T09:33:41.8122067Z  * [new branch]              gh/soulitzer/287/head       -> origin/gh/soulitzer/287/head
2025-12-04T09:33:41.8123840Z  * [new branch]              gh/soulitzer/287/orig       -> origin/gh/soulitzer/287/orig
2025-12-04T09:33:41.8125713Z  * [new branch]              gh/soulitzer/296/base       -> origin/gh/soulitzer/296/base
2025-12-04T09:33:41.8126957Z  * [new branch]              gh/soulitzer/296/head       -> origin/gh/soulitzer/296/head
2025-12-04T09:33:41.8128241Z  * [new branch]              gh/soulitzer/296/orig       -> origin/gh/soulitzer/296/orig
2025-12-04T09:33:41.8130086Z  * [new branch]              gh/soulitzer/299/base       -> origin/gh/soulitzer/299/base
2025-12-04T09:33:41.8131356Z  * [new branch]              gh/soulitzer/299/head       -> origin/gh/soulitzer/299/head
2025-12-04T09:33:41.8132693Z  * [new branch]              gh/soulitzer/299/orig       -> origin/gh/soulitzer/299/orig
2025-12-04T09:33:41.8134588Z  * [new branch]              gh/soulitzer/300/base       -> origin/gh/soulitzer/300/base
2025-12-04T09:33:41.8135858Z  * [new branch]              gh/soulitzer/300/head       -> origin/gh/soulitzer/300/head
2025-12-04T09:33:41.8137120Z  * [new branch]              gh/soulitzer/300/orig       -> origin/gh/soulitzer/300/orig
2025-12-04T09:33:41.8139177Z  * [new branch]              gh/soulitzer/301/base       -> origin/gh/soulitzer/301/base
2025-12-04T09:33:41.8140486Z  * [new branch]              gh/soulitzer/301/head       -> origin/gh/soulitzer/301/head
2025-12-04T09:33:41.8141733Z  * [new branch]              gh/soulitzer/301/orig       -> origin/gh/soulitzer/301/orig
2025-12-04T09:33:41.8143558Z  * [new branch]              gh/soulitzer/313/base       -> origin/gh/soulitzer/313/base
2025-12-04T09:33:41.8144722Z  * [new branch]              gh/soulitzer/313/head       -> origin/gh/soulitzer/313/head
2025-12-04T09:33:41.8146169Z  * [new branch]              gh/soulitzer/313/orig       -> origin/gh/soulitzer/313/orig
2025-12-04T09:33:41.8147933Z  * [new branch]              gh/soulitzer/319/base       -> origin/gh/soulitzer/319/base
2025-12-04T09:33:41.8149113Z  * [new branch]              gh/soulitzer/319/head       -> origin/gh/soulitzer/319/head
2025-12-04T09:33:41.8150390Z  * [new branch]              gh/soulitzer/319/orig       -> origin/gh/soulitzer/319/orig
2025-12-04T09:33:41.8152361Z  * [new branch]              gh/soulitzer/320/base       -> origin/gh/soulitzer/320/base
2025-12-04T09:33:41.8153490Z  * [new branch]              gh/soulitzer/320/head       -> origin/gh/soulitzer/320/head
2025-12-04T09:33:41.8154762Z  * [new branch]              gh/soulitzer/320/orig       -> origin/gh/soulitzer/320/orig
2025-12-04T09:33:41.8156787Z  * [new branch]              gh/soulitzer/336/base       -> origin/gh/soulitzer/336/base
2025-12-04T09:33:41.8157936Z  * [new branch]              gh/soulitzer/336/head       -> origin/gh/soulitzer/336/head
2025-12-04T09:33:41.8159264Z  * [new branch]              gh/soulitzer/336/orig       -> origin/gh/soulitzer/336/orig
2025-12-04T09:33:41.8161081Z  * [new branch]              gh/soulitzer/347/base       -> origin/gh/soulitzer/347/base
2025-12-04T09:33:41.8162272Z  * [new branch]              gh/soulitzer/347/head       -> origin/gh/soulitzer/347/head
2025-12-04T09:33:41.8163662Z  * [new branch]              gh/soulitzer/347/orig       -> origin/gh/soulitzer/347/orig
2025-12-04T09:33:41.8165783Z  * [new branch]              gh/soulitzer/349/base       -> origin/gh/soulitzer/349/base
2025-12-04T09:33:41.8167522Z  * [new branch]              gh/soulitzer/349/head       -> origin/gh/soulitzer/349/head
2025-12-04T09:33:41.8168767Z  * [new branch]              gh/soulitzer/349/orig       -> origin/gh/soulitzer/349/orig
2025-12-04T09:33:41.8170501Z  * [new branch]              gh/soulitzer/350/base       -> origin/gh/soulitzer/350/base
2025-12-04T09:33:41.8171633Z  * [new branch]              gh/soulitzer/350/head       -> origin/gh/soulitzer/350/head
2025-12-04T09:33:41.8172891Z  * [new branch]              gh/soulitzer/350/orig       -> origin/gh/soulitzer/350/orig
2025-12-04T09:33:41.8174851Z  * [new branch]              gh/soulitzer/351/base       -> origin/gh/soulitzer/351/base
2025-12-04T09:33:41.8176021Z  * [new branch]              gh/soulitzer/351/head       -> origin/gh/soulitzer/351/head
2025-12-04T09:33:41.8177284Z  * [new branch]              gh/soulitzer/351/orig       -> origin/gh/soulitzer/351/orig
2025-12-04T09:33:41.8179104Z  * [new branch]              gh/soulitzer/353/base       -> origin/gh/soulitzer/353/base
2025-12-04T09:33:41.8180389Z  * [new branch]              gh/soulitzer/353/head       -> origin/gh/soulitzer/353/head
2025-12-04T09:33:41.8181692Z  * [new branch]              gh/soulitzer/353/orig       -> origin/gh/soulitzer/353/orig
2025-12-04T09:33:41.8184186Z  * [new branch]              gh/soulitzer/358/base       -> origin/gh/soulitzer/358/base
2025-12-04T09:33:41.8186002Z  * [new branch]              gh/soulitzer/358/head       -> origin/gh/soulitzer/358/head
2025-12-04T09:33:41.8187176Z  * [new branch]              gh/soulitzer/358/orig       -> origin/gh/soulitzer/358/orig
2025-12-04T09:33:41.8189652Z  * [new branch]              gh/soulitzer/359/base       -> origin/gh/soulitzer/359/base
2025-12-04T09:33:41.8190866Z  * [new branch]              gh/soulitzer/359/head       -> origin/gh/soulitzer/359/head
2025-12-04T09:33:41.8192179Z  * [new branch]              gh/soulitzer/359/orig       -> origin/gh/soulitzer/359/orig
2025-12-04T09:33:41.8194137Z  * [new branch]              gh/soulitzer/374/base       -> origin/gh/soulitzer/374/base
2025-12-04T09:33:41.8195344Z  * [new branch]              gh/soulitzer/374/head       -> origin/gh/soulitzer/374/head
2025-12-04T09:33:41.8196586Z  * [new branch]              gh/soulitzer/374/orig       -> origin/gh/soulitzer/374/orig
2025-12-04T09:33:41.8198492Z  * [new branch]              gh/soulitzer/375/base       -> origin/gh/soulitzer/375/base
2025-12-04T09:33:41.8199666Z  * [new branch]              gh/soulitzer/375/head       -> origin/gh/soulitzer/375/head
2025-12-04T09:33:41.8201003Z  * [new branch]              gh/soulitzer/375/orig       -> origin/gh/soulitzer/375/orig
2025-12-04T09:33:41.8202990Z  * [new branch]              gh/soulitzer/380/base       -> origin/gh/soulitzer/380/base
2025-12-04T09:33:41.8204243Z  * [new branch]              gh/soulitzer/380/head       -> origin/gh/soulitzer/380/head
2025-12-04T09:33:41.8205546Z  * [new branch]              gh/soulitzer/380/orig       -> origin/gh/soulitzer/380/orig
2025-12-04T09:33:41.8207352Z  * [new branch]              gh/soulitzer/385/base       -> origin/gh/soulitzer/385/base
2025-12-04T09:33:41.8208578Z  * [new branch]              gh/soulitzer/385/head       -> origin/gh/soulitzer/385/head
2025-12-04T09:33:41.8209862Z  * [new branch]              gh/soulitzer/385/orig       -> origin/gh/soulitzer/385/orig
2025-12-04T09:33:41.8211842Z  * [new branch]              gh/soulitzer/386/base       -> origin/gh/soulitzer/386/base
2025-12-04T09:33:41.8213000Z  * [new branch]              gh/soulitzer/386/head       -> origin/gh/soulitzer/386/head
2025-12-04T09:33:41.8214253Z  * [new branch]              gh/soulitzer/386/orig       -> origin/gh/soulitzer/386/orig
2025-12-04T09:33:41.8216091Z  * [new branch]              gh/soulitzer/387/base       -> origin/gh/soulitzer/387/base
2025-12-04T09:33:41.8217258Z  * [new branch]              gh/soulitzer/387/head       -> origin/gh/soulitzer/387/head
2025-12-04T09:33:41.8218491Z  * [new branch]              gh/soulitzer/387/orig       -> origin/gh/soulitzer/387/orig
2025-12-04T09:33:41.8220297Z  * [new branch]              gh/soulitzer/388/base       -> origin/gh/soulitzer/388/base
2025-12-04T09:33:41.8221466Z  * [new branch]              gh/soulitzer/388/head       -> origin/gh/soulitzer/388/head
2025-12-04T09:33:41.8222751Z  * [new branch]              gh/soulitzer/388/orig       -> origin/gh/soulitzer/388/orig
2025-12-04T09:33:41.8224587Z  * [new branch]              gh/soulitzer/389/base       -> origin/gh/soulitzer/389/base
2025-12-04T09:33:41.8225758Z  * [new branch]              gh/soulitzer/389/head       -> origin/gh/soulitzer/389/head
2025-12-04T09:33:41.8227028Z  * [new branch]              gh/soulitzer/389/orig       -> origin/gh/soulitzer/389/orig
2025-12-04T09:33:41.8229021Z  * [new branch]              gh/soulitzer/390/base       -> origin/gh/soulitzer/390/base
2025-12-04T09:33:41.8230184Z  * [new branch]              gh/soulitzer/390/head       -> origin/gh/soulitzer/390/head
2025-12-04T09:33:41.8231460Z  * [new branch]              gh/soulitzer/390/orig       -> origin/gh/soulitzer/390/orig
2025-12-04T09:33:41.8233261Z  * [new branch]              gh/soulitzer/391/base       -> origin/gh/soulitzer/391/base
2025-12-04T09:33:41.8234422Z  * [new branch]              gh/soulitzer/391/head       -> origin/gh/soulitzer/391/head
2025-12-04T09:33:41.8235693Z  * [new branch]              gh/soulitzer/391/orig       -> origin/gh/soulitzer/391/orig
2025-12-04T09:33:41.8237494Z  * [new branch]              gh/soulitzer/392/base       -> origin/gh/soulitzer/392/base
2025-12-04T09:33:41.8238654Z  * [new branch]              gh/soulitzer/392/head       -> origin/gh/soulitzer/392/head
2025-12-04T09:33:41.8239897Z  * [new branch]              gh/soulitzer/392/orig       -> origin/gh/soulitzer/392/orig
2025-12-04T09:33:41.8242630Z  * [new branch]              gh/swolchok/728/next        -> origin/gh/swolchok/728/next
2025-12-04T09:33:41.8245197Z  * [new branch]              gh/swolchok/819/base        -> origin/gh/swolchok/819/base
2025-12-04T09:33:41.8246406Z  * [new branch]              gh/swolchok/819/head        -> origin/gh/swolchok/819/head
2025-12-04T09:33:41.8247872Z  * [new branch]              gh/swolchok/819/orig        -> origin/gh/swolchok/819/orig
2025-12-04T09:33:41.8249614Z  * [new branch]              gh/swolchok/824/base        -> origin/gh/swolchok/824/base
2025-12-04T09:33:41.8250942Z  * [new branch]              gh/swolchok/824/head        -> origin/gh/swolchok/824/head
2025-12-04T09:33:41.8252091Z  * [new branch]              gh/swolchok/824/orig        -> origin/gh/swolchok/824/orig
2025-12-04T09:33:41.8253931Z  * [new branch]              gh/swolchok/829/base        -> origin/gh/swolchok/829/base
2025-12-04T09:33:41.8255020Z  * [new branch]              gh/swolchok/829/head        -> origin/gh/swolchok/829/head
2025-12-04T09:33:41.8256334Z  * [new branch]              gh/swolchok/829/orig        -> origin/gh/swolchok/829/orig
2025-12-04T09:33:41.8258211Z  * [new branch]              gh/swolchok/839/base        -> origin/gh/swolchok/839/base
2025-12-04T09:33:41.8259408Z  * [new branch]              gh/swolchok/839/head        -> origin/gh/swolchok/839/head
2025-12-04T09:33:41.8260657Z  * [new branch]              gh/swolchok/839/orig        -> origin/gh/swolchok/839/orig
2025-12-04T09:33:41.8262439Z  * [new branch]              gh/swolchok/841/base        -> origin/gh/swolchok/841/base
2025-12-04T09:33:41.8263707Z  * [new branch]              gh/swolchok/841/head        -> origin/gh/swolchok/841/head
2025-12-04T09:33:41.8265126Z  * [new branch]              gh/swolchok/841/orig        -> origin/gh/swolchok/841/orig
2025-12-04T09:33:41.8266893Z  * [new branch]              gh/swolchok/842/base        -> origin/gh/swolchok/842/base
2025-12-04T09:33:41.8268054Z  * [new branch]              gh/swolchok/842/head        -> origin/gh/swolchok/842/head
2025-12-04T09:33:41.8269337Z  * [new branch]              gh/swolchok/842/orig        -> origin/gh/swolchok/842/orig
2025-12-04T09:33:41.8271094Z  * [new branch]              gh/swolchok/845/base        -> origin/gh/swolchok/845/base
2025-12-04T09:33:41.8272265Z  * [new branch]              gh/swolchok/845/head        -> origin/gh/swolchok/845/head
2025-12-04T09:33:41.8273770Z  * [new branch]              gh/swolchok/845/orig        -> origin/gh/swolchok/845/orig
2025-12-04T09:33:41.8275985Z  * [new branch]              gh/swolchok/848/base        -> origin/gh/swolchok/848/base
2025-12-04T09:33:41.8277271Z  * [new branch]              gh/swolchok/848/head        -> origin/gh/swolchok/848/head
2025-12-04T09:33:41.8278549Z  * [new branch]              gh/swolchok/848/orig        -> origin/gh/swolchok/848/orig
2025-12-04T09:33:41.8280423Z  * [new branch]              gh/swolchok/856/base        -> origin/gh/swolchok/856/base
2025-12-04T09:33:41.8281923Z  * [new branch]              gh/swolchok/856/head        -> origin/gh/swolchok/856/head
2025-12-04T09:33:41.8283176Z  * [new branch]              gh/swolchok/856/orig        -> origin/gh/swolchok/856/orig
2025-12-04T09:33:41.8285131Z  * [new branch]              gh/swolchok/860/base        -> origin/gh/swolchok/860/base
2025-12-04T09:33:41.8286955Z  * [new branch]              gh/swolchok/860/head        -> origin/gh/swolchok/860/head
2025-12-04T09:33:41.8288130Z  * [new branch]              gh/swolchok/860/orig        -> origin/gh/swolchok/860/orig
2025-12-04T09:33:41.8290731Z  * [new branch]              gh/swolchok/861/base        -> origin/gh/swolchok/861/base
2025-12-04T09:33:41.8291980Z  * [new branch]              gh/swolchok/861/head        -> origin/gh/swolchok/861/head
2025-12-04T09:33:41.8293435Z  * [new branch]              gh/swolchok/861/orig        -> origin/gh/swolchok/861/orig
2025-12-04T09:33:41.8295218Z  * [new branch]              gh/swolchok/862/base        -> origin/gh/swolchok/862/base
2025-12-04T09:33:41.8296356Z  * [new branch]              gh/swolchok/862/head        -> origin/gh/swolchok/862/head
2025-12-04T09:33:41.8297575Z  * [new branch]              gh/swolchok/862/orig        -> origin/gh/swolchok/862/orig
2025-12-04T09:33:41.8299577Z  * [new branch]              gh/swolchok/863/base        -> origin/gh/swolchok/863/base
2025-12-04T09:33:41.8301062Z  * [new branch]              gh/swolchok/863/head        -> origin/gh/swolchok/863/head
2025-12-04T09:33:41.8302674Z  * [new branch]              gh/swolchok/863/orig        -> origin/gh/swolchok/863/orig
2025-12-04T09:33:41.8304503Z  * [new branch]              gh/swolchok/864/base        -> origin/gh/swolchok/864/base
2025-12-04T09:33:41.8305576Z  * [new branch]              gh/swolchok/864/head        -> origin/gh/swolchok/864/head
2025-12-04T09:33:41.8306992Z  * [new branch]              gh/swolchok/864/orig        -> origin/gh/swolchok/864/orig
2025-12-04T09:33:41.8308764Z  * [new branch]              gh/swolchok/865/base        -> origin/gh/swolchok/865/base
2025-12-04T09:33:41.8310273Z  * [new branch]              gh/swolchok/865/head        -> origin/gh/swolchok/865/head
2025-12-04T09:33:41.8311496Z  * [new branch]              gh/swolchok/865/orig        -> origin/gh/swolchok/865/orig
2025-12-04T09:33:41.8313903Z  * [new branch]              gh/swolchok/866/base        -> origin/gh/swolchok/866/base
2025-12-04T09:33:41.8315099Z  * [new branch]              gh/swolchok/866/head        -> origin/gh/swolchok/866/head
2025-12-04T09:33:41.8316515Z  * [new branch]              gh/swolchok/866/orig        -> origin/gh/swolchok/866/orig
2025-12-04T09:33:41.8318298Z  * [new branch]              gh/swolchok/867/base        -> origin/gh/swolchok/867/base
2025-12-04T09:33:41.8319727Z  * [new branch]              gh/swolchok/867/head        -> origin/gh/swolchok/867/head
2025-12-04T09:33:41.8320923Z  * [new branch]              gh/swolchok/867/orig        -> origin/gh/swolchok/867/orig
2025-12-04T09:33:41.8323246Z  * [new branch]              gh/swolchok/868/base        -> origin/gh/swolchok/868/base
2025-12-04T09:33:41.8324419Z  * [new branch]              gh/swolchok/868/head        -> origin/gh/swolchok/868/head
2025-12-04T09:33:41.8325701Z  * [new branch]              gh/swolchok/868/orig        -> origin/gh/swolchok/868/orig
2025-12-04T09:33:41.8327590Z  * [new branch]              gh/swolchok/869/base        -> origin/gh/swolchok/869/base
2025-12-04T09:33:41.8328790Z  * [new branch]              gh/swolchok/869/head        -> origin/gh/swolchok/869/head
2025-12-04T09:33:41.8330655Z  * [new branch]              gh/swolchok/869/orig        -> origin/gh/swolchok/869/orig
2025-12-04T09:33:41.8332551Z  * [new branch]              gh/swolchok/870/base        -> origin/gh/swolchok/870/base
2025-12-04T09:33:41.8333702Z  * [new branch]              gh/swolchok/870/head        -> origin/gh/swolchok/870/head
2025-12-04T09:33:41.8334967Z  * [new branch]              gh/swolchok/870/orig        -> origin/gh/swolchok/870/orig
2025-12-04T09:33:41.8336896Z  * [new branch]              gh/swolchok/871/base        -> origin/gh/swolchok/871/base
2025-12-04T09:33:41.8338385Z  * [new branch]              gh/swolchok/871/head        -> origin/gh/swolchok/871/head
2025-12-04T09:33:41.8339918Z  * [new branch]              gh/swolchok/871/orig        -> origin/gh/swolchok/871/orig
2025-12-04T09:33:41.8342053Z  * [new branch]              gh/teja-rao/4/base          -> origin/gh/teja-rao/4/base
2025-12-04T09:33:41.8343899Z  * [new branch]              gh/teja-rao/4/head          -> origin/gh/teja-rao/4/head
2025-12-04T09:33:41.8345093Z  * [new branch]              gh/teja-rao/4/orig          -> origin/gh/teja-rao/4/orig
2025-12-04T09:33:41.8347306Z  * [new branch]              gh/tianyu-l/2/base          -> origin/gh/tianyu-l/2/base
2025-12-04T09:33:41.8348485Z  * [new branch]              gh/tianyu-l/2/head          -> origin/gh/tianyu-l/2/head
2025-12-04T09:33:41.8349766Z  * [new branch]              gh/tianyu-l/2/orig          -> origin/gh/tianyu-l/2/orig
2025-12-04T09:33:41.8351602Z  * [new branch]              gh/tianyu-l/3/base          -> origin/gh/tianyu-l/3/base
2025-12-04T09:33:41.8352786Z  * [new branch]              gh/tianyu-l/3/orig          -> origin/gh/tianyu-l/3/orig
2025-12-04T09:33:41.8354670Z  * [new branch]              gh/tianyu-l/4/base          -> origin/gh/tianyu-l/4/base
2025-12-04T09:33:41.8355828Z  * [new branch]              gh/tianyu-l/4/head          -> origin/gh/tianyu-l/4/head
2025-12-04T09:33:41.8357106Z  * [new branch]              gh/tianyu-l/4/orig          -> origin/gh/tianyu-l/4/orig
2025-12-04T09:33:41.8359790Z  * [new branch]              gh/tugsbayasgalan/10/base   -> origin/gh/tugsbayasgalan/10/base
2025-12-04T09:33:41.8360935Z  * [new branch]              gh/tugsbayasgalan/10/head   -> origin/gh/tugsbayasgalan/10/head
2025-12-04T09:33:41.8362285Z  * [new branch]              gh/tugsbayasgalan/10/orig   -> origin/gh/tugsbayasgalan/10/orig
2025-12-04T09:33:41.8364148Z  * [new branch]              gh/tugsbayasgalan/13/base   -> origin/gh/tugsbayasgalan/13/base
2025-12-04T09:33:41.8365327Z  * [new branch]              gh/tugsbayasgalan/13/head   -> origin/gh/tugsbayasgalan/13/head
2025-12-04T09:33:41.8366579Z  * [new branch]              gh/tugsbayasgalan/13/orig   -> origin/gh/tugsbayasgalan/13/orig
2025-12-04T09:33:41.8368599Z  * [new branch]              gh/tugsbayasgalan/17/base   -> origin/gh/tugsbayasgalan/17/base
2025-12-04T09:33:41.8369681Z  * [new branch]              gh/tugsbayasgalan/17/head   -> origin/gh/tugsbayasgalan/17/head
2025-12-04T09:33:41.8370973Z  * [new branch]              gh/tugsbayasgalan/17/orig   -> origin/gh/tugsbayasgalan/17/orig
2025-12-04T09:33:41.8372984Z  * [new branch]              gh/tugsbayasgalan/2/base    -> origin/gh/tugsbayasgalan/2/base
2025-12-04T09:33:41.8374185Z  * [new branch]              gh/tugsbayasgalan/2/head    -> origin/gh/tugsbayasgalan/2/head
2025-12-04T09:33:41.8375493Z  * [new branch]              gh/tugsbayasgalan/2/orig    -> origin/gh/tugsbayasgalan/2/orig
2025-12-04T09:33:41.8377685Z  * [new branch]              gh/tugsbayasgalan/28/base   -> origin/gh/tugsbayasgalan/28/base
2025-12-04T09:33:41.8378826Z  * [new branch]              gh/tugsbayasgalan/28/head   -> origin/gh/tugsbayasgalan/28/head
2025-12-04T09:33:41.8380061Z  * [new branch]              gh/tugsbayasgalan/28/orig   -> origin/gh/tugsbayasgalan/28/orig
2025-12-04T09:33:41.8381923Z  * [new branch]              gh/tugsbayasgalan/32/base   -> origin/gh/tugsbayasgalan/32/base
2025-12-04T09:33:41.8383117Z  * [new branch]              gh/tugsbayasgalan/32/head   -> origin/gh/tugsbayasgalan/32/head
2025-12-04T09:33:41.8384387Z  * [new branch]              gh/tugsbayasgalan/32/orig   -> origin/gh/tugsbayasgalan/32/orig
2025-12-04T09:33:41.8386320Z  * [new branch]              gh/tugsbayasgalan/35/base   -> origin/gh/tugsbayasgalan/35/base
2025-12-04T09:33:41.8387627Z  * [new branch]              gh/tugsbayasgalan/35/head   -> origin/gh/tugsbayasgalan/35/head
2025-12-04T09:33:41.8388856Z  * [new branch]              gh/tugsbayasgalan/35/orig   -> origin/gh/tugsbayasgalan/35/orig
2025-12-04T09:33:41.8390822Z  * [new branch]              gh/tugsbayasgalan/36/base   -> origin/gh/tugsbayasgalan/36/base
2025-12-04T09:33:41.8391959Z  * [new branch]              gh/tugsbayasgalan/36/head   -> origin/gh/tugsbayasgalan/36/head
2025-12-04T09:33:41.8393253Z  * [new branch]              gh/tugsbayasgalan/36/orig   -> origin/gh/tugsbayasgalan/36/orig
2025-12-04T09:33:41.8395096Z  * [new branch]              gh/tugsbayasgalan/37/base   -> origin/gh/tugsbayasgalan/37/base
2025-12-04T09:33:41.8396291Z  * [new branch]              gh/tugsbayasgalan/37/head   -> origin/gh/tugsbayasgalan/37/head
2025-12-04T09:33:41.8397550Z  * [new branch]              gh/tugsbayasgalan/37/orig   -> origin/gh/tugsbayasgalan/37/orig
2025-12-04T09:33:41.8399314Z  * [new branch]              gh/tugsbayasgalan/43/base   -> origin/gh/tugsbayasgalan/43/base
2025-12-04T09:33:41.8400525Z  * [new branch]              gh/tugsbayasgalan/43/head   -> origin/gh/tugsbayasgalan/43/head
2025-12-04T09:33:41.8404499Z  * [new branch]              gh/tugsbayasgalan/43/orig   -> origin/gh/tugsbayasgalan/43/orig
2025-12-04T09:33:41.8406076Z  * [new branch]              gh/tugsbayasgalan/48/base   -> origin/gh/tugsbayasgalan/48/base
2025-12-04T09:33:41.8407255Z  * [new branch]              gh/tugsbayasgalan/48/head   -> origin/gh/tugsbayasgalan/48/head
2025-12-04T09:33:41.8408526Z  * [new branch]              gh/tugsbayasgalan/48/orig   -> origin/gh/tugsbayasgalan/48/orig
2025-12-04T09:33:41.8410492Z  * [new branch]              gh/tugsbayasgalan/51/base   -> origin/gh/tugsbayasgalan/51/base
2025-12-04T09:33:41.8411809Z  * [new branch]              gh/tugsbayasgalan/51/head   -> origin/gh/tugsbayasgalan/51/head
2025-12-04T09:33:41.8412972Z  * [new branch]              gh/tugsbayasgalan/51/orig   -> origin/gh/tugsbayasgalan/51/orig
2025-12-04T09:33:41.8414614Z  * [new branch]              gh/tugsbayasgalan/52/base   -> origin/gh/tugsbayasgalan/52/base
2025-12-04T09:33:41.8415854Z  * [new branch]              gh/tugsbayasgalan/52/head   -> origin/gh/tugsbayasgalan/52/head
2025-12-04T09:33:41.8417127Z  * [new branch]              gh/tugsbayasgalan/52/orig   -> origin/gh/tugsbayasgalan/52/orig
2025-12-04T09:33:41.8419055Z  * [new branch]              gh/tugsbayasgalan/53/base   -> origin/gh/tugsbayasgalan/53/base
2025-12-04T09:33:41.8420195Z  * [new branch]              gh/tugsbayasgalan/53/head   -> origin/gh/tugsbayasgalan/53/head
2025-12-04T09:33:41.8421486Z  * [new branch]              gh/tugsbayasgalan/53/orig   -> origin/gh/tugsbayasgalan/53/orig
2025-12-04T09:33:41.8423442Z  * [new branch]              gh/tugsbayasgalan/55/base   -> origin/gh/tugsbayasgalan/55/base
2025-12-04T09:33:41.8424772Z  * [new branch]              gh/tugsbayasgalan/55/head   -> origin/gh/tugsbayasgalan/55/head
2025-12-04T09:33:41.8426039Z  * [new branch]              gh/tugsbayasgalan/55/orig   -> origin/gh/tugsbayasgalan/55/orig
2025-12-04T09:33:41.8428139Z  * [new branch]              gh/tugsbayasgalan/59/base   -> origin/gh/tugsbayasgalan/59/base
2025-12-04T09:33:41.8429393Z  * [new branch]              gh/tugsbayasgalan/59/head   -> origin/gh/tugsbayasgalan/59/head
2025-12-04T09:33:41.8430660Z  * [new branch]              gh/tugsbayasgalan/59/orig   -> origin/gh/tugsbayasgalan/59/orig
2025-12-04T09:33:41.8432388Z  * [new branch]              gh/tugsbayasgalan/6/base    -> origin/gh/tugsbayasgalan/6/base
2025-12-04T09:33:41.8433566Z  * [new branch]              gh/tugsbayasgalan/6/head    -> origin/gh/tugsbayasgalan/6/head
2025-12-04T09:33:41.8434812Z  * [new branch]              gh/tugsbayasgalan/6/orig    -> origin/gh/tugsbayasgalan/6/orig
2025-12-04T09:33:41.8436480Z  * [new branch]              gh/tugsbayasgalan/60/base   -> origin/gh/tugsbayasgalan/60/base
2025-12-04T09:33:41.8437660Z  * [new branch]              gh/tugsbayasgalan/60/head   -> origin/gh/tugsbayasgalan/60/head
2025-12-04T09:33:41.8438973Z  * [new branch]              gh/tugsbayasgalan/60/orig   -> origin/gh/tugsbayasgalan/60/orig
2025-12-04T09:33:41.8441284Z  * [new branch]              gh/tugsbayasgalan/61/base   -> origin/gh/tugsbayasgalan/61/base
2025-12-04T09:33:41.8442459Z  * [new branch]              gh/tugsbayasgalan/61/head   -> origin/gh/tugsbayasgalan/61/head
2025-12-04T09:33:41.8443828Z  * [new branch]              gh/tugsbayasgalan/61/orig   -> origin/gh/tugsbayasgalan/61/orig
2025-12-04T09:33:41.8445896Z  * [new branch]              gh/tugsbayasgalan/63/base   -> origin/gh/tugsbayasgalan/63/base
2025-12-04T09:33:41.8447117Z  * [new branch]              gh/tugsbayasgalan/63/head   -> origin/gh/tugsbayasgalan/63/head
2025-12-04T09:33:41.8448424Z  * [new branch]              gh/tugsbayasgalan/63/orig   -> origin/gh/tugsbayasgalan/63/orig
2025-12-04T09:33:41.8450325Z  * [new branch]              gh/tugsbayasgalan/67/base   -> origin/gh/tugsbayasgalan/67/base
2025-12-04T09:33:41.8451509Z  * [new branch]              gh/tugsbayasgalan/67/head   -> origin/gh/tugsbayasgalan/67/head
2025-12-04T09:33:41.8452774Z  * [new branch]              gh/tugsbayasgalan/67/orig   -> origin/gh/tugsbayasgalan/67/orig
2025-12-04T09:33:41.8454832Z  * [new branch]              gh/tugsbayasgalan/68/base   -> origin/gh/tugsbayasgalan/68/base
2025-12-04T09:33:41.8456029Z  * [new branch]              gh/tugsbayasgalan/68/head   -> origin/gh/tugsbayasgalan/68/head
2025-12-04T09:33:41.8457303Z  * [new branch]              gh/tugsbayasgalan/68/orig   -> origin/gh/tugsbayasgalan/68/orig
2025-12-04T09:33:41.8459181Z  * [new branch]              gh/tugsbayasgalan/7/base    -> origin/gh/tugsbayasgalan/7/base
2025-12-04T09:33:41.8460433Z  * [new branch]              gh/tugsbayasgalan/7/head    -> origin/gh/tugsbayasgalan/7/head
2025-12-04T09:33:41.8461843Z  * [new branch]              gh/tugsbayasgalan/7/orig    -> origin/gh/tugsbayasgalan/7/orig
2025-12-04T09:33:41.8464090Z  * [new branch]              gh/tugsbayasgalan/70/base   -> origin/gh/tugsbayasgalan/70/base
2025-12-04T09:33:41.8466000Z  * [new branch]              gh/tugsbayasgalan/70/head   -> origin/gh/tugsbayasgalan/70/head
2025-12-04T09:33:41.8467240Z  * [new branch]              gh/tugsbayasgalan/70/orig   -> origin/gh/tugsbayasgalan/70/orig
2025-12-04T09:33:41.8469303Z  * [new branch]              gh/tugsbayasgalan/71/base   -> origin/gh/tugsbayasgalan/71/base
2025-12-04T09:33:41.8470648Z  * [new branch]              gh/tugsbayasgalan/71/head   -> origin/gh/tugsbayasgalan/71/head
2025-12-04T09:33:41.8472023Z  * [new branch]              gh/tugsbayasgalan/71/orig   -> origin/gh/tugsbayasgalan/71/orig
2025-12-04T09:33:41.8474043Z  * [new branch]              gh/tugsbayasgalan/72/base   -> origin/gh/tugsbayasgalan/72/base
2025-12-04T09:33:41.8475298Z  * [new branch]              gh/tugsbayasgalan/72/head   -> origin/gh/tugsbayasgalan/72/head
2025-12-04T09:33:41.8476577Z  * [new branch]              gh/tugsbayasgalan/72/orig   -> origin/gh/tugsbayasgalan/72/orig
2025-12-04T09:33:41.8478504Z  * [new branch]              gh/tugsbayasgalan/73/base   -> origin/gh/tugsbayasgalan/73/base
2025-12-04T09:33:41.8479771Z  * [new branch]              gh/tugsbayasgalan/73/head   -> origin/gh/tugsbayasgalan/73/head
2025-12-04T09:33:41.8481064Z  * [new branch]              gh/tugsbayasgalan/73/orig   -> origin/gh/tugsbayasgalan/73/orig
2025-12-04T09:33:41.8483421Z  * [new branch]              gh/tugsbayasgalan/74/base   -> origin/gh/tugsbayasgalan/74/base
2025-12-04T09:33:41.8484687Z  * [new branch]              gh/tugsbayasgalan/74/head   -> origin/gh/tugsbayasgalan/74/head
2025-12-04T09:33:41.8485975Z  * [new branch]              gh/tugsbayasgalan/74/orig   -> origin/gh/tugsbayasgalan/74/orig
2025-12-04T09:33:41.8487908Z  * [new branch]              gh/tugsbayasgalan/75/base   -> origin/gh/tugsbayasgalan/75/base
2025-12-04T09:33:41.8489103Z  * [new branch]              gh/tugsbayasgalan/75/head   -> origin/gh/tugsbayasgalan/75/head
2025-12-04T09:33:41.8490379Z  * [new branch]              gh/tugsbayasgalan/75/orig   -> origin/gh/tugsbayasgalan/75/orig
2025-12-04T09:33:41.8492102Z  * [new branch]              gh/tugsbayasgalan/76/base   -> origin/gh/tugsbayasgalan/76/base
2025-12-04T09:33:41.8493372Z  * [new branch]              gh/tugsbayasgalan/76/head   -> origin/gh/tugsbayasgalan/76/head
2025-12-04T09:33:41.8494591Z  * [new branch]              gh/tugsbayasgalan/76/orig   -> origin/gh/tugsbayasgalan/76/orig
2025-12-04T09:33:41.8496638Z  * [new branch]              gh/tugsbayasgalan/77/base   -> origin/gh/tugsbayasgalan/77/base
2025-12-04T09:33:41.8497781Z  * [new branch]              gh/tugsbayasgalan/77/head   -> origin/gh/tugsbayasgalan/77/head
2025-12-04T09:33:41.8499061Z  * [new branch]              gh/tugsbayasgalan/77/orig   -> origin/gh/tugsbayasgalan/77/orig
2025-12-04T09:33:41.8501393Z  * [new branch]              gh/tugsbayasgalan/78/base   -> origin/gh/tugsbayasgalan/78/base
2025-12-04T09:33:41.8502766Z  * [new branch]              gh/tugsbayasgalan/78/head   -> origin/gh/tugsbayasgalan/78/head
2025-12-04T09:33:41.8504059Z  * [new branch]              gh/tugsbayasgalan/78/orig   -> origin/gh/tugsbayasgalan/78/orig
2025-12-04T09:33:41.8506025Z  * [new branch]              gh/tugsbayasgalan/79/base   -> origin/gh/tugsbayasgalan/79/base
2025-12-04T09:33:41.8507252Z  * [new branch]              gh/tugsbayasgalan/79/head   -> origin/gh/tugsbayasgalan/79/head
2025-12-04T09:33:41.8508525Z  * [new branch]              gh/tugsbayasgalan/79/orig   -> origin/gh/tugsbayasgalan/79/orig
2025-12-04T09:33:41.8510492Z  * [new branch]              gh/tugsbayasgalan/8/base    -> origin/gh/tugsbayasgalan/8/base
2025-12-04T09:33:41.8511624Z  * [new branch]              gh/tugsbayasgalan/8/head    -> origin/gh/tugsbayasgalan/8/head
2025-12-04T09:33:41.8513059Z  * [new branch]              gh/tugsbayasgalan/8/orig    -> origin/gh/tugsbayasgalan/8/orig
2025-12-04T09:33:41.8514753Z  * [new branch]              gh/tugsbayasgalan/80/base   -> origin/gh/tugsbayasgalan/80/base
2025-12-04T09:33:41.8516443Z  * [new branch]              gh/tugsbayasgalan/80/head   -> origin/gh/tugsbayasgalan/80/head
2025-12-04T09:33:41.8517599Z  * [new branch]              gh/tugsbayasgalan/80/orig   -> origin/gh/tugsbayasgalan/80/orig
2025-12-04T09:33:41.8519606Z  * [new branch]              gh/tugsbayasgalan/81/base   -> origin/gh/tugsbayasgalan/81/base
2025-12-04T09:33:41.8520718Z  * [new branch]              gh/tugsbayasgalan/81/head   -> origin/gh/tugsbayasgalan/81/head
2025-12-04T09:33:41.8522044Z  * [new branch]              gh/tugsbayasgalan/81/orig   -> origin/gh/tugsbayasgalan/81/orig
2025-12-04T09:33:41.8524805Z  * [new branch]              gh/tugsbayasgalan/82/base   -> origin/gh/tugsbayasgalan/82/base
2025-12-04T09:33:41.8526138Z  * [new branch]              gh/tugsbayasgalan/82/head   -> origin/gh/tugsbayasgalan/82/head
2025-12-04T09:33:41.8527448Z  * [new branch]              gh/tugsbayasgalan/82/orig   -> origin/gh/tugsbayasgalan/82/orig
2025-12-04T09:33:41.8529169Z  * [new branch]              gh/tugsbayasgalan/83/base   -> origin/gh/tugsbayasgalan/83/base
2025-12-04T09:33:41.8531085Z  * [new branch]              gh/tugsbayasgalan/83/head   -> origin/gh/tugsbayasgalan/83/head
2025-12-04T09:33:41.8532312Z  * [new branch]              gh/tugsbayasgalan/83/orig   -> origin/gh/tugsbayasgalan/83/orig
2025-12-04T09:33:41.8534527Z  * [new branch]              gh/tugsbayasgalan/84/base   -> origin/gh/tugsbayasgalan/84/base
2025-12-04T09:33:41.8535723Z  * [new branch]              gh/tugsbayasgalan/84/head   -> origin/gh/tugsbayasgalan/84/head
2025-12-04T09:33:41.8537019Z  * [new branch]              gh/tugsbayasgalan/84/orig   -> origin/gh/tugsbayasgalan/84/orig
2025-12-04T09:33:41.8539271Z  * [new branch]              gh/tugsbayasgalan/85/base   -> origin/gh/tugsbayasgalan/85/base
2025-12-04T09:33:41.8540462Z  * [new branch]              gh/tugsbayasgalan/85/head   -> origin/gh/tugsbayasgalan/85/head
2025-12-04T09:33:41.8541780Z  * [new branch]              gh/tugsbayasgalan/85/orig   -> origin/gh/tugsbayasgalan/85/orig
2025-12-04T09:33:41.8543669Z  * [new branch]              gh/tugsbayasgalan/86/base   -> origin/gh/tugsbayasgalan/86/base
2025-12-04T09:33:41.8544936Z  * [new branch]              gh/tugsbayasgalan/86/head   -> origin/gh/tugsbayasgalan/86/head
2025-12-04T09:33:41.8546207Z  * [new branch]              gh/tugsbayasgalan/86/orig   -> origin/gh/tugsbayasgalan/86/orig
2025-12-04T09:33:41.8548465Z  * [new branch]              gh/tugsbayasgalan/87/base   -> origin/gh/tugsbayasgalan/87/base
2025-12-04T09:33:41.8549636Z  * [new branch]              gh/tugsbayasgalan/87/head   -> origin/gh/tugsbayasgalan/87/head
2025-12-04T09:33:41.8550972Z  * [new branch]              gh/tugsbayasgalan/87/orig   -> origin/gh/tugsbayasgalan/87/orig
2025-12-04T09:33:41.8552903Z  * [new branch]              gh/tugsbayasgalan/88/base   -> origin/gh/tugsbayasgalan/88/base
2025-12-04T09:33:41.8554071Z  * [new branch]              gh/tugsbayasgalan/88/head   -> origin/gh/tugsbayasgalan/88/head
2025-12-04T09:33:41.8555378Z  * [new branch]              gh/tugsbayasgalan/88/orig   -> origin/gh/tugsbayasgalan/88/orig
2025-12-04T09:33:41.8557375Z  * [new branch]              gh/tugsbayasgalan/89/base   -> origin/gh/tugsbayasgalan/89/base
2025-12-04T09:33:41.8558956Z  * [new branch]              gh/tugsbayasgalan/89/head   -> origin/gh/tugsbayasgalan/89/head
2025-12-04T09:33:41.8560206Z  * [new branch]              gh/tugsbayasgalan/89/orig   -> origin/gh/tugsbayasgalan/89/orig
2025-12-04T09:33:41.8562030Z  * [new branch]              gh/tugsbayasgalan/9/base    -> origin/gh/tugsbayasgalan/9/base
2025-12-04T09:33:41.8563233Z  * [new branch]              gh/tugsbayasgalan/9/head    -> origin/gh/tugsbayasgalan/9/head
2025-12-04T09:33:41.8564544Z  * [new branch]              gh/tugsbayasgalan/9/orig    -> origin/gh/tugsbayasgalan/9/orig
2025-12-04T09:33:41.8567503Z  * [new branch]              gh/tugsbayasgalan/90/base   -> origin/gh/tugsbayasgalan/90/base
2025-12-04T09:33:41.8568538Z  * [new branch]              gh/tugsbayasgalan/90/head   -> origin/gh/tugsbayasgalan/90/head
2025-12-04T09:33:41.8569811Z  * [new branch]              gh/tugsbayasgalan/90/orig   -> origin/gh/tugsbayasgalan/90/orig
2025-12-04T09:33:41.8571959Z  * [new branch]              gh/tugsbayasgalan/91/base   -> origin/gh/tugsbayasgalan/91/base
2025-12-04T09:33:41.8573119Z  * [new branch]              gh/tugsbayasgalan/91/head   -> origin/gh/tugsbayasgalan/91/head
2025-12-04T09:33:41.8574355Z  * [new branch]              gh/tugsbayasgalan/91/orig   -> origin/gh/tugsbayasgalan/91/orig
2025-12-04T09:33:41.8576410Z  * [new branch]              gh/tugsbayasgalan/92/base   -> origin/gh/tugsbayasgalan/92/base
2025-12-04T09:33:41.8577633Z  * [new branch]              gh/tugsbayasgalan/92/head   -> origin/gh/tugsbayasgalan/92/head
2025-12-04T09:33:41.8579014Z  * [new branch]              gh/tugsbayasgalan/92/orig   -> origin/gh/tugsbayasgalan/92/orig
2025-12-04T09:33:41.8581019Z  * [new branch]              gh/tugsbayasgalan/93/base   -> origin/gh/tugsbayasgalan/93/base
2025-12-04T09:33:41.8582249Z  * [new branch]              gh/tugsbayasgalan/93/head   -> origin/gh/tugsbayasgalan/93/head
2025-12-04T09:33:41.8583626Z  * [new branch]              gh/tugsbayasgalan/93/orig   -> origin/gh/tugsbayasgalan/93/orig
2025-12-04T09:33:41.8585955Z  * [new branch]              gh/v0i0/14/base             -> origin/gh/v0i0/14/base
2025-12-04T09:33:41.8587058Z  * [new branch]              gh/v0i0/14/head             -> origin/gh/v0i0/14/head
2025-12-04T09:33:41.8588287Z  * [new branch]              gh/v0i0/14/orig             -> origin/gh/v0i0/14/orig
2025-12-04T09:33:41.8589937Z  * [new branch]              gh/v0i0/15/base             -> origin/gh/v0i0/15/base
2025-12-04T09:33:41.8591785Z  * [new branch]              gh/v0i0/15/head             -> origin/gh/v0i0/15/head
2025-12-04T09:33:41.8593148Z  * [new branch]              gh/v0i0/15/orig             -> origin/gh/v0i0/15/orig
2025-12-04T09:33:41.8594909Z  * [new branch]              gh/v0i0/16/base             -> origin/gh/v0i0/16/base
2025-12-04T09:33:41.8596084Z  * [new branch]              gh/v0i0/16/head             -> origin/gh/v0i0/16/head
2025-12-04T09:33:41.8597542Z  * [new branch]              gh/v0i0/16/orig             -> origin/gh/v0i0/16/orig
2025-12-04T09:33:41.8599211Z  * [new branch]              gh/v0i0/17/base             -> origin/gh/v0i0/17/base
2025-12-04T09:33:41.8600412Z  * [new branch]              gh/v0i0/17/head             -> origin/gh/v0i0/17/head
2025-12-04T09:33:41.8602034Z  * [new branch]              gh/v0i0/17/orig             -> origin/gh/v0i0/17/orig
2025-12-04T09:33:41.8603935Z  * [new branch]              gh/v0i0/18/base             -> origin/gh/v0i0/18/base
2025-12-04T09:33:41.8605200Z  * [new branch]              gh/v0i0/18/head             -> origin/gh/v0i0/18/head
2025-12-04T09:33:41.8606513Z  * [new branch]              gh/v0i0/18/orig             -> origin/gh/v0i0/18/orig
2025-12-04T09:33:41.8608322Z  * [new branch]              gh/v0i0/19/base             -> origin/gh/v0i0/19/base
2025-12-04T09:33:41.8609507Z  * [new branch]              gh/v0i0/19/head             -> origin/gh/v0i0/19/head
2025-12-04T09:33:41.8611001Z  * [new branch]              gh/v0i0/19/orig             -> origin/gh/v0i0/19/orig
2025-12-04T09:33:41.8613205Z  * [new branch]              gh/vishal9-team/1/base      -> origin/gh/vishal9-team/1/base
2025-12-04T09:33:41.8614439Z  * [new branch]              gh/vishal9-team/1/head      -> origin/gh/vishal9-team/1/head
2025-12-04T09:33:41.8616057Z  * [new branch]              gh/vishal9-team/2/base      -> origin/gh/vishal9-team/2/base
2025-12-04T09:33:41.8617259Z  * [new branch]              gh/vishal9-team/2/head      -> origin/gh/vishal9-team/2/head
2025-12-04T09:33:41.8618508Z  * [new branch]              gh/vishal9-team/2/orig      -> origin/gh/vishal9-team/2/orig
2025-12-04T09:33:41.8620432Z  * [new branch]              gh/vishal9-team/3/base      -> origin/gh/vishal9-team/3/base
2025-12-04T09:33:41.8621586Z  * [new branch]              gh/vishal9-team/3/head      -> origin/gh/vishal9-team/3/head
2025-12-04T09:33:41.8623042Z  * [new branch]              gh/vishal9-team/3/orig      -> origin/gh/vishal9-team/3/orig
2025-12-04T09:33:41.8624644Z  * [new branch]              gh/vishal9-team/4/base      -> origin/gh/vishal9-team/4/base
2025-12-04T09:33:41.8625812Z  * [new branch]              gh/vishal9-team/4/head      -> origin/gh/vishal9-team/4/head
2025-12-04T09:33:41.8627159Z  * [new branch]              gh/vishal9-team/4/orig      -> origin/gh/vishal9-team/4/orig
2025-12-04T09:33:41.8629319Z  * [new branch]              gh/vkuzo/1/next             -> origin/gh/vkuzo/1/next
2025-12-04T09:33:41.8631064Z  * [new branch]              gh/vkuzo/2/next             -> origin/gh/vkuzo/2/next
2025-12-04T09:33:41.8632792Z  * [new branch]              gh/vkuzo/3/next             -> origin/gh/vkuzo/3/next
2025-12-04T09:33:41.8634839Z  * [new branch]              gh/wconstab/424/base        -> origin/gh/wconstab/424/base
2025-12-04T09:33:41.8636139Z  * [new branch]              gh/wconstab/424/head        -> origin/gh/wconstab/424/head
2025-12-04T09:33:41.8637635Z  * [new branch]              gh/wconstab/424/orig        -> origin/gh/wconstab/424/orig
2025-12-04T09:33:41.8639402Z  * [new branch]              gh/wconstab/435/base        -> origin/gh/wconstab/435/base
2025-12-04T09:33:41.8640596Z  * [new branch]              gh/wconstab/435/head        -> origin/gh/wconstab/435/head
2025-12-04T09:33:41.8642064Z  * [new branch]              gh/wconstab/435/orig        -> origin/gh/wconstab/435/orig
2025-12-04T09:33:41.8643996Z  * [new branch]              gh/wconstab/444/base        -> origin/gh/wconstab/444/base
2025-12-04T09:33:41.8645269Z  * [new branch]              gh/wconstab/444/head        -> origin/gh/wconstab/444/head
2025-12-04T09:33:41.8646540Z  * [new branch]              gh/wconstab/444/orig        -> origin/gh/wconstab/444/orig
2025-12-04T09:33:41.8648391Z  * [new branch]              gh/wconstab/447/base        -> origin/gh/wconstab/447/base
2025-12-04T09:33:41.8649548Z  * [new branch]              gh/wconstab/447/head        -> origin/gh/wconstab/447/head
2025-12-04T09:33:41.8650835Z  * [new branch]              gh/wconstab/447/orig        -> origin/gh/wconstab/447/orig
2025-12-04T09:33:41.8652674Z  * [new branch]              gh/wconstab/448/base        -> origin/gh/wconstab/448/base
2025-12-04T09:33:41.8653900Z  * [new branch]              gh/wconstab/448/head        -> origin/gh/wconstab/448/head
2025-12-04T09:33:41.8655190Z  * [new branch]              gh/wconstab/448/orig        -> origin/gh/wconstab/448/orig
2025-12-04T09:33:41.8656887Z  * [new branch]              gh/wconstab/449/base        -> origin/gh/wconstab/449/base
2025-12-04T09:33:41.8658110Z  * [new branch]              gh/wconstab/449/head        -> origin/gh/wconstab/449/head
2025-12-04T09:33:41.8659683Z  * [new branch]              gh/wconstab/449/orig        -> origin/gh/wconstab/449/orig
2025-12-04T09:33:41.8661196Z  * [new branch]              gh/wconstab/450/base        -> origin/gh/wconstab/450/base
2025-12-04T09:33:41.8662515Z  * [new branch]              gh/wconstab/450/head        -> origin/gh/wconstab/450/head
2025-12-04T09:33:41.8663806Z  * [new branch]              gh/wconstab/450/orig        -> origin/gh/wconstab/450/orig
2025-12-04T09:33:41.8665434Z  * [new branch]              gh/wconstab/451/base        -> origin/gh/wconstab/451/base
2025-12-04T09:33:41.8666933Z  * [new branch]              gh/wconstab/451/head        -> origin/gh/wconstab/451/head
2025-12-04T09:33:41.8668102Z  * [new branch]              gh/wconstab/451/orig        -> origin/gh/wconstab/451/orig
2025-12-04T09:33:41.8670003Z  * [new branch]              gh/wconstab/452/base        -> origin/gh/wconstab/452/base
2025-12-04T09:33:41.8671134Z  * [new branch]              gh/wconstab/452/head        -> origin/gh/wconstab/452/head
2025-12-04T09:33:41.8672463Z  * [new branch]              gh/wconstab/452/orig        -> origin/gh/wconstab/452/orig
2025-12-04T09:33:41.8674033Z  * [new branch]              gh/wconstab/453/base        -> origin/gh/wconstab/453/base
2025-12-04T09:33:41.8675312Z  * [new branch]              gh/wconstab/453/head        -> origin/gh/wconstab/453/head
2025-12-04T09:33:41.8676899Z  * [new branch]              gh/wconstab/453/orig        -> origin/gh/wconstab/453/orig
2025-12-04T09:33:41.8678500Z  * [new branch]              gh/wconstab/454/base        -> origin/gh/wconstab/454/base
2025-12-04T09:33:41.8679683Z  * [new branch]              gh/wconstab/454/head        -> origin/gh/wconstab/454/head
2025-12-04T09:33:41.8680964Z  * [new branch]              gh/wconstab/454/orig        -> origin/gh/wconstab/454/orig
2025-12-04T09:33:41.8682841Z  * [new branch]              gh/wconstab/455/base        -> origin/gh/wconstab/455/base
2025-12-04T09:33:41.8684080Z  * [new branch]              gh/wconstab/455/head        -> origin/gh/wconstab/455/head
2025-12-04T09:33:41.8685395Z  * [new branch]              gh/wconstab/455/orig        -> origin/gh/wconstab/455/orig
2025-12-04T09:33:41.8687995Z  * [new branch]              gh/wconstab/456/base        -> origin/gh/wconstab/456/base
2025-12-04T09:33:41.8689668Z  * [new branch]              gh/wconstab/456/head        -> origin/gh/wconstab/456/head
2025-12-04T09:33:41.8691049Z  * [new branch]              gh/wconstab/456/orig        -> origin/gh/wconstab/456/orig
2025-12-04T09:33:41.8694266Z  * [new branch]              gh/wconstab/457/base        -> origin/gh/wconstab/457/base
2025-12-04T09:33:41.8695075Z  * [new branch]              gh/wconstab/457/head        -> origin/gh/wconstab/457/head
2025-12-04T09:33:41.8696094Z  * [new branch]              gh/wconstab/457/orig        -> origin/gh/wconstab/457/orig
2025-12-04T09:33:41.8697689Z  * [new branch]              gh/wconstab/458/base        -> origin/gh/wconstab/458/base
2025-12-04T09:33:41.8699005Z  * [new branch]              gh/wconstab/458/head        -> origin/gh/wconstab/458/head
2025-12-04T09:33:41.8700317Z  * [new branch]              gh/wconstab/458/orig        -> origin/gh/wconstab/458/orig
2025-12-04T09:33:41.8702134Z  * [new branch]              gh/wconstab/459/base        -> origin/gh/wconstab/459/base
2025-12-04T09:33:41.8703502Z  * [new branch]              gh/wconstab/459/head        -> origin/gh/wconstab/459/head
2025-12-04T09:33:41.8704700Z  * [new branch]              gh/wconstab/459/orig        -> origin/gh/wconstab/459/orig
2025-12-04T09:33:41.8707153Z  * [new branch]              gh/wconstab/460/base        -> origin/gh/wconstab/460/base
2025-12-04T09:33:41.8708741Z  * [new branch]              gh/wconstab/460/head        -> origin/gh/wconstab/460/head
2025-12-04T09:33:41.8710180Z  * [new branch]              gh/wconstab/460/orig        -> origin/gh/wconstab/460/orig
2025-12-04T09:33:41.8712188Z  * [new branch]              gh/wconstab/461/base        -> origin/gh/wconstab/461/base
2025-12-04T09:33:41.8713476Z  * [new branch]              gh/wconstab/461/head        -> origin/gh/wconstab/461/head
2025-12-04T09:33:41.8715402Z  * [new branch]              gh/wconstab/461/orig        -> origin/gh/wconstab/461/orig
2025-12-04T09:33:41.8717030Z  * [new branch]              gh/wconstab/462/base        -> origin/gh/wconstab/462/base
2025-12-04T09:33:41.8718412Z  * [new branch]              gh/wconstab/462/head        -> origin/gh/wconstab/462/head
2025-12-04T09:33:41.8719801Z  * [new branch]              gh/wconstab/462/orig        -> origin/gh/wconstab/462/orig
2025-12-04T09:33:41.8721617Z  * [new branch]              gh/wconstab/463/base        -> origin/gh/wconstab/463/base
2025-12-04T09:33:41.8723128Z  * [new branch]              gh/wconstab/463/head        -> origin/gh/wconstab/463/head
2025-12-04T09:33:41.8724413Z  * [new branch]              gh/wconstab/463/orig        -> origin/gh/wconstab/463/orig
2025-12-04T09:33:41.8726183Z  * [new branch]              gh/wconstab/464/base        -> origin/gh/wconstab/464/base
2025-12-04T09:33:41.8727624Z  * [new branch]              gh/wconstab/464/head        -> origin/gh/wconstab/464/head
2025-12-04T09:33:41.8728906Z  * [new branch]              gh/wconstab/464/orig        -> origin/gh/wconstab/464/orig
2025-12-04T09:33:41.8730567Z  * [new branch]              gh/wconstab/465/base        -> origin/gh/wconstab/465/base
2025-12-04T09:33:41.8731905Z  * [new branch]              gh/wconstab/465/head        -> origin/gh/wconstab/465/head
2025-12-04T09:33:41.8733193Z  * [new branch]              gh/wconstab/465/orig        -> origin/gh/wconstab/465/orig
2025-12-04T09:33:41.8735073Z  * [new branch]              gh/wconstab/466/base        -> origin/gh/wconstab/466/base
2025-12-04T09:33:41.8736260Z  * [new branch]              gh/wconstab/466/head        -> origin/gh/wconstab/466/head
2025-12-04T09:33:41.8737440Z  * [new branch]              gh/wconstab/466/orig        -> origin/gh/wconstab/466/orig
2025-12-04T09:33:41.8739594Z  * [new branch]              gh/wconstab/467/base        -> origin/gh/wconstab/467/base
2025-12-04T09:33:41.8741025Z  * [new branch]              gh/wconstab/467/head        -> origin/gh/wconstab/467/head
2025-12-04T09:33:41.8742275Z  * [new branch]              gh/wconstab/467/orig        -> origin/gh/wconstab/467/orig
2025-12-04T09:33:41.8743869Z  * [new branch]              gh/wconstab/468/base        -> origin/gh/wconstab/468/base
2025-12-04T09:33:41.8745152Z  * [new branch]              gh/wconstab/468/head        -> origin/gh/wconstab/468/head
2025-12-04T09:33:41.8746402Z  * [new branch]              gh/wconstab/468/orig        -> origin/gh/wconstab/468/orig
2025-12-04T09:33:41.8748711Z  * [new branch]              gh/weifengpy/39/base        -> origin/gh/weifengpy/39/base
2025-12-04T09:33:41.8750093Z  * [new branch]              gh/weifengpy/39/head        -> origin/gh/weifengpy/39/head
2025-12-04T09:33:41.8751490Z  * [new branch]              gh/weifengpy/39/orig        -> origin/gh/weifengpy/39/orig
2025-12-04T09:33:41.8753402Z  * [new branch]              gh/weifengpy/40/base        -> origin/gh/weifengpy/40/base
2025-12-04T09:33:41.8754731Z  * [new branch]              gh/weifengpy/40/head        -> origin/gh/weifengpy/40/head
2025-12-04T09:33:41.8755994Z  * [new branch]              gh/weifengpy/40/orig        -> origin/gh/weifengpy/40/orig
2025-12-04T09:33:41.8757847Z  * [new branch]              gh/weifengpy/41/base        -> origin/gh/weifengpy/41/base
2025-12-04T09:33:41.8759236Z  * [new branch]              gh/weifengpy/41/head        -> origin/gh/weifengpy/41/head
2025-12-04T09:33:41.8760623Z  * [new branch]              gh/weifengpy/41/orig        -> origin/gh/weifengpy/41/orig
2025-12-04T09:33:41.8763014Z  * [new branch]              gh/williamwen42/250/base    -> origin/gh/williamwen42/250/base
2025-12-04T09:33:41.8764338Z  * [new branch]              gh/williamwen42/250/head    -> origin/gh/williamwen42/250/head
2025-12-04T09:33:41.8765626Z  * [new branch]              gh/williamwen42/250/orig    -> origin/gh/williamwen42/250/orig
2025-12-04T09:33:41.8767509Z  * [new branch]              gh/williamwen42/279/base    -> origin/gh/williamwen42/279/base
2025-12-04T09:33:41.8768975Z  * [new branch]              gh/williamwen42/279/head    -> origin/gh/williamwen42/279/head
2025-12-04T09:33:41.8770260Z  * [new branch]              gh/williamwen42/279/orig    -> origin/gh/williamwen42/279/orig
2025-12-04T09:33:41.8771996Z  * [new branch]              gh/williamwen42/282/base    -> origin/gh/williamwen42/282/base
2025-12-04T09:33:41.8773281Z  * [new branch]              gh/williamwen42/282/head    -> origin/gh/williamwen42/282/head
2025-12-04T09:33:41.8774514Z  * [new branch]              gh/williamwen42/282/orig    -> origin/gh/williamwen42/282/orig
2025-12-04T09:33:41.8776392Z  * [new branch]              gh/williamwen42/287/base    -> origin/gh/williamwen42/287/base
2025-12-04T09:33:41.8777723Z  * [new branch]              gh/williamwen42/287/head    -> origin/gh/williamwen42/287/head
2025-12-04T09:33:41.8779047Z  * [new branch]              gh/williamwen42/287/orig    -> origin/gh/williamwen42/287/orig
2025-12-04T09:33:41.8780904Z  * [new branch]              gh/williamwen42/288/base    -> origin/gh/williamwen42/288/base
2025-12-04T09:33:41.8782090Z  * [new branch]              gh/williamwen42/288/head    -> origin/gh/williamwen42/288/head
2025-12-04T09:33:41.8783358Z  * [new branch]              gh/williamwen42/288/orig    -> origin/gh/williamwen42/288/orig
2025-12-04T09:33:41.8785382Z  * [new branch]              gh/williamwen42/296/base    -> origin/gh/williamwen42/296/base
2025-12-04T09:33:41.8786815Z  * [new branch]              gh/williamwen42/296/head    -> origin/gh/williamwen42/296/head
2025-12-04T09:33:41.8788153Z  * [new branch]              gh/williamwen42/296/orig    -> origin/gh/williamwen42/296/orig
2025-12-04T09:33:41.8789793Z  * [new branch]              gh/williamwen42/297/base    -> origin/gh/williamwen42/297/base
2025-12-04T09:33:41.8791234Z  * [new branch]              gh/williamwen42/297/head    -> origin/gh/williamwen42/297/head
2025-12-04T09:33:41.8792873Z  * [new branch]              gh/williamwen42/297/orig    -> origin/gh/williamwen42/297/orig
2025-12-04T09:33:41.8794736Z  * [new branch]              gh/williamwen42/306/base    -> origin/gh/williamwen42/306/base
2025-12-04T09:33:41.8796091Z  * [new branch]              gh/williamwen42/306/head    -> origin/gh/williamwen42/306/head
2025-12-04T09:33:41.8797373Z  * [new branch]              gh/williamwen42/306/orig    -> origin/gh/williamwen42/306/orig
2025-12-04T09:33:41.8799174Z  * [new branch]              gh/williamwen42/309/base    -> origin/gh/williamwen42/309/base
2025-12-04T09:33:41.8801224Z  * [new branch]              gh/williamwen42/309/head    -> origin/gh/williamwen42/309/head
2025-12-04T09:33:41.8804930Z  * [new branch]              gh/williamwen42/309/orig    -> origin/gh/williamwen42/309/orig
2025-12-04T09:33:41.8806736Z  * [new branch]              gh/williamwen42/310/base    -> origin/gh/williamwen42/310/base
2025-12-04T09:33:41.8808089Z  * [new branch]              gh/williamwen42/310/head    -> origin/gh/williamwen42/310/head
2025-12-04T09:33:41.8809482Z  * [new branch]              gh/williamwen42/310/orig    -> origin/gh/williamwen42/310/orig
2025-12-04T09:33:41.8812677Z  * [new branch]              gh/williamwen42/311/base    -> origin/gh/williamwen42/311/base
2025-12-04T09:33:41.8813990Z  * [new branch]              gh/williamwen42/311/head    -> origin/gh/williamwen42/311/head
2025-12-04T09:33:41.8815305Z  * [new branch]              gh/williamwen42/311/orig    -> origin/gh/williamwen42/311/orig
2025-12-04T09:33:41.8816894Z  * [new branch]              gh/williamwen42/319/base    -> origin/gh/williamwen42/319/base
2025-12-04T09:33:41.8818136Z  * [new branch]              gh/williamwen42/319/head    -> origin/gh/williamwen42/319/head
2025-12-04T09:33:41.8819440Z  * [new branch]              gh/williamwen42/319/orig    -> origin/gh/williamwen42/319/orig
2025-12-04T09:33:41.8821213Z  * [new branch]              gh/williamwen42/325/base    -> origin/gh/williamwen42/325/base
2025-12-04T09:33:41.8822606Z  * [new branch]              gh/williamwen42/325/head    -> origin/gh/williamwen42/325/head
2025-12-04T09:33:41.8823868Z  * [new branch]              gh/williamwen42/325/orig    -> origin/gh/williamwen42/325/orig
2025-12-04T09:33:41.8825759Z  * [new branch]              gh/williamwen42/326/base    -> origin/gh/williamwen42/326/base
2025-12-04T09:33:41.8827147Z  * [new branch]              gh/williamwen42/326/head    -> origin/gh/williamwen42/326/head
2025-12-04T09:33:41.8828419Z  * [new branch]              gh/williamwen42/326/orig    -> origin/gh/williamwen42/326/orig
2025-12-04T09:33:41.8830240Z  * [new branch]              gh/williamwen42/327/base    -> origin/gh/williamwen42/327/base
2025-12-04T09:33:41.8831568Z  * [new branch]              gh/williamwen42/327/head    -> origin/gh/williamwen42/327/head
2025-12-04T09:33:41.8832857Z  * [new branch]              gh/williamwen42/327/orig    -> origin/gh/williamwen42/327/orig
2025-12-04T09:33:41.8835087Z  * [new branch]              gh/williamwen42/328/base    -> origin/gh/williamwen42/328/base
2025-12-04T09:33:41.8836560Z  * [new branch]              gh/williamwen42/328/head    -> origin/gh/williamwen42/328/head
2025-12-04T09:33:41.8837735Z  * [new branch]              gh/williamwen42/328/orig    -> origin/gh/williamwen42/328/orig
2025-12-04T09:33:41.8840007Z  * [new branch]              gh/williamwen42/329/base    -> origin/gh/williamwen42/329/base
2025-12-04T09:33:41.8841420Z  * [new branch]              gh/williamwen42/329/head    -> origin/gh/williamwen42/329/head
2025-12-04T09:33:41.8842836Z  * [new branch]              gh/williamwen42/329/orig    -> origin/gh/williamwen42/329/orig
2025-12-04T09:33:41.8844843Z  * [new branch]              gh/williamwen42/330/base    -> origin/gh/williamwen42/330/base
2025-12-04T09:33:41.8846154Z  * [new branch]              gh/williamwen42/330/head    -> origin/gh/williamwen42/330/head
2025-12-04T09:33:41.8847439Z  * [new branch]              gh/williamwen42/330/orig    -> origin/gh/williamwen42/330/orig
2025-12-04T09:33:41.8849210Z  * [new branch]              gh/williamwen42/331/base    -> origin/gh/williamwen42/331/base
2025-12-04T09:33:41.8850456Z  * [new branch]              gh/williamwen42/331/head    -> origin/gh/williamwen42/331/head
2025-12-04T09:33:41.8851756Z  * [new branch]              gh/williamwen42/331/orig    -> origin/gh/williamwen42/331/orig
2025-12-04T09:33:41.8853366Z  * [new branch]              gh/williamwen42/332/base    -> origin/gh/williamwen42/332/base
2025-12-04T09:33:41.8854641Z  * [new branch]              gh/williamwen42/332/head    -> origin/gh/williamwen42/332/head
2025-12-04T09:33:41.8855938Z  * [new branch]              gh/williamwen42/332/orig    -> origin/gh/williamwen42/332/orig
2025-12-04T09:33:41.8857953Z  * [new branch]              gh/williamwen42/333/base    -> origin/gh/williamwen42/333/base
2025-12-04T09:33:41.8859182Z  * [new branch]              gh/williamwen42/333/head    -> origin/gh/williamwen42/333/head
2025-12-04T09:33:41.8860484Z  * [new branch]              gh/williamwen42/333/orig    -> origin/gh/williamwen42/333/orig
2025-12-04T09:33:41.8862852Z  * [new branch]              gh/williamwen42/334/base    -> origin/gh/williamwen42/334/base
2025-12-04T09:33:41.8864163Z  * [new branch]              gh/williamwen42/334/head    -> origin/gh/williamwen42/334/head
2025-12-04T09:33:41.8865496Z  * [new branch]              gh/williamwen42/334/orig    -> origin/gh/williamwen42/334/orig
2025-12-04T09:33:41.8871486Z  * [new branch]              gh/williamwen42/335/base    -> origin/gh/williamwen42/335/base
2025-12-04T09:33:41.8872884Z  * [new branch]              gh/williamwen42/335/head    -> origin/gh/williamwen42/335/head
2025-12-04T09:33:41.8874206Z  * [new branch]              gh/williamwen42/335/orig    -> origin/gh/williamwen42/335/orig
2025-12-04T09:33:41.8875984Z  * [new branch]              gh/williamwen42/336/base    -> origin/gh/williamwen42/336/base
2025-12-04T09:33:41.8877194Z  * [new branch]              gh/williamwen42/336/head    -> origin/gh/williamwen42/336/head
2025-12-04T09:33:41.8878393Z  * [new branch]              gh/williamwen42/336/orig    -> origin/gh/williamwen42/336/orig
2025-12-04T09:33:41.8880246Z  * [new branch]              gh/williamwen42/337/base    -> origin/gh/williamwen42/337/base
2025-12-04T09:33:41.8881535Z  * [new branch]              gh/williamwen42/337/head    -> origin/gh/williamwen42/337/head
2025-12-04T09:33:41.8882914Z  * [new branch]              gh/williamwen42/337/orig    -> origin/gh/williamwen42/337/orig
2025-12-04T09:33:41.8884898Z  * [new branch]              gh/williamwen42/338/base    -> origin/gh/williamwen42/338/base
2025-12-04T09:33:41.8886199Z  * [new branch]              gh/williamwen42/338/head    -> origin/gh/williamwen42/338/head
2025-12-04T09:33:41.8887449Z  * [new branch]              gh/williamwen42/338/orig    -> origin/gh/williamwen42/338/orig
2025-12-04T09:33:41.8889168Z  * [new branch]              gh/williamwen42/339/base    -> origin/gh/williamwen42/339/base
2025-12-04T09:33:41.8890526Z  * [new branch]              gh/williamwen42/339/head    -> origin/gh/williamwen42/339/head
2025-12-04T09:33:41.8891771Z  * [new branch]              gh/williamwen42/339/orig    -> origin/gh/williamwen42/339/orig
2025-12-04T09:33:41.8893589Z  * [new branch]              gh/williamwen42/340/base    -> origin/gh/williamwen42/340/base
2025-12-04T09:33:41.8894810Z  * [new branch]              gh/williamwen42/340/head    -> origin/gh/williamwen42/340/head
2025-12-04T09:33:41.8896011Z  * [new branch]              gh/williamwen42/340/orig    -> origin/gh/williamwen42/340/orig
2025-12-04T09:33:41.8897918Z  * [new branch]              gh/williamwen42/341/base    -> origin/gh/williamwen42/341/base
2025-12-04T09:33:41.8899245Z  * [new branch]              gh/williamwen42/341/head    -> origin/gh/williamwen42/341/head
2025-12-04T09:33:41.8900511Z  * [new branch]              gh/williamwen42/341/orig    -> origin/gh/williamwen42/341/orig
2025-12-04T09:33:41.8902597Z  * [new branch]              gh/williamwen42/342/base    -> origin/gh/williamwen42/342/base
2025-12-04T09:33:41.8904340Z  * [new branch]              gh/williamwen42/342/head    -> origin/gh/williamwen42/342/head
2025-12-04T09:33:41.8905633Z  * [new branch]              gh/williamwen42/342/orig    -> origin/gh/williamwen42/342/orig
2025-12-04T09:33:41.8907443Z  * [new branch]              gh/williamwen42/343/base    -> origin/gh/williamwen42/343/base
2025-12-04T09:33:41.8908775Z  * [new branch]              gh/williamwen42/343/head    -> origin/gh/williamwen42/343/head
2025-12-04T09:33:41.8910027Z  * [new branch]              gh/williamwen42/343/orig    -> origin/gh/williamwen42/343/orig
2025-12-04T09:33:41.8911799Z  * [new branch]              gh/williamwen42/344/base    -> origin/gh/williamwen42/344/base
2025-12-04T09:33:41.8913075Z  * [new branch]              gh/williamwen42/344/head    -> origin/gh/williamwen42/344/head
2025-12-04T09:33:41.8914360Z  * [new branch]              gh/williamwen42/344/orig    -> origin/gh/williamwen42/344/orig
2025-12-04T09:33:41.8916186Z  * [new branch]              gh/williamwen42/345/base    -> origin/gh/williamwen42/345/base
2025-12-04T09:33:41.8917480Z  * [new branch]              gh/williamwen42/345/head    -> origin/gh/williamwen42/345/head
2025-12-04T09:33:41.8918760Z  * [new branch]              gh/williamwen42/345/orig    -> origin/gh/williamwen42/345/orig
2025-12-04T09:33:41.8920640Z  * [new branch]              gh/williamwen42/346/base    -> origin/gh/williamwen42/346/base
2025-12-04T09:33:41.8921974Z  * [new branch]              gh/williamwen42/346/head    -> origin/gh/williamwen42/346/head
2025-12-04T09:33:41.8923428Z  * [new branch]              gh/williamwen42/346/orig    -> origin/gh/williamwen42/346/orig
2025-12-04T09:33:41.8925285Z  * [new branch]              gh/williamwen42/347/base    -> origin/gh/williamwen42/347/base
2025-12-04T09:33:41.8926603Z  * [new branch]              gh/williamwen42/347/head    -> origin/gh/williamwen42/347/head
2025-12-04T09:33:41.8927837Z  * [new branch]              gh/williamwen42/347/orig    -> origin/gh/williamwen42/347/orig
2025-12-04T09:33:41.8929524Z  * [new branch]              gh/williamwen42/348/base    -> origin/gh/williamwen42/348/base
2025-12-04T09:33:41.8930700Z  * [new branch]              gh/williamwen42/348/head    -> origin/gh/williamwen42/348/head
2025-12-04T09:33:41.8931952Z  * [new branch]              gh/williamwen42/348/orig    -> origin/gh/williamwen42/348/orig
2025-12-04T09:33:41.8933977Z  * [new branch]              gh/williamwen42/349/base    -> origin/gh/williamwen42/349/base
2025-12-04T09:33:41.8935319Z  * [new branch]              gh/williamwen42/349/head    -> origin/gh/williamwen42/349/head
2025-12-04T09:33:41.8936577Z  * [new branch]              gh/williamwen42/349/orig    -> origin/gh/williamwen42/349/orig
2025-12-04T09:33:41.8938516Z  * [new branch]              gh/williamwen42/350/base    -> origin/gh/williamwen42/350/base
2025-12-04T09:33:41.8939795Z  * [new branch]              gh/williamwen42/350/head    -> origin/gh/williamwen42/350/head
2025-12-04T09:33:41.8941248Z  * [new branch]              gh/williamwen42/350/orig    -> origin/gh/williamwen42/350/orig
2025-12-04T09:33:41.8942889Z  * [new branch]              gh/williamwen42/351/base    -> origin/gh/williamwen42/351/base
2025-12-04T09:33:41.8944278Z  * [new branch]              gh/williamwen42/351/head    -> origin/gh/williamwen42/351/head
2025-12-04T09:33:41.8945589Z  * [new branch]              gh/williamwen42/351/orig    -> origin/gh/williamwen42/351/orig
2025-12-04T09:33:41.8947336Z  * [new branch]              gh/williamwen42/352/base    -> origin/gh/williamwen42/352/base
2025-12-04T09:33:41.8948610Z  * [new branch]              gh/williamwen42/352/head    -> origin/gh/williamwen42/352/head
2025-12-04T09:33:41.8949883Z  * [new branch]              gh/williamwen42/352/orig    -> origin/gh/williamwen42/352/orig
2025-12-04T09:33:41.8951741Z  * [new branch]              gh/williamwen42/353/base    -> origin/gh/williamwen42/353/base
2025-12-04T09:33:41.8953055Z  * [new branch]              gh/williamwen42/353/head    -> origin/gh/williamwen42/353/head
2025-12-04T09:33:41.8954357Z  * [new branch]              gh/williamwen42/353/orig    -> origin/gh/williamwen42/353/orig
2025-12-04T09:33:41.8956135Z  * [new branch]              gh/williamwen42/354/base    -> origin/gh/williamwen42/354/base
2025-12-04T09:33:41.8957521Z  * [new branch]              gh/williamwen42/354/head    -> origin/gh/williamwen42/354/head
2025-12-04T09:33:41.8958793Z  * [new branch]              gh/williamwen42/354/orig    -> origin/gh/williamwen42/354/orig
2025-12-04T09:33:41.8960579Z  * [new branch]              gh/williamwen42/355/base    -> origin/gh/williamwen42/355/base
2025-12-04T09:33:41.8970904Z  * [new branch]              gh/williamwen42/355/head    -> origin/gh/williamwen42/355/head
2025-12-04T09:33:41.8971569Z  * [new branch]              gh/williamwen42/355/orig    -> origin/gh/williamwen42/355/orig
2025-12-04T09:33:41.8971868Z  * [new branch]              gh/williamwen42/356/base    -> origin/gh/williamwen42/356/base
2025-12-04T09:33:41.8972177Z  * [new branch]              gh/williamwen42/356/head    -> origin/gh/williamwen42/356/head
2025-12-04T09:33:41.8972453Z  * [new branch]              gh/williamwen42/356/orig    -> origin/gh/williamwen42/356/orig
2025-12-04T09:33:41.8972728Z  * [new branch]              gh/williamwen42/357/base    -> origin/gh/williamwen42/357/base
2025-12-04T09:33:41.8973037Z  * [new branch]              gh/williamwen42/357/head    -> origin/gh/williamwen42/357/head
2025-12-04T09:33:41.8973313Z  * [new branch]              gh/williamwen42/357/orig    -> origin/gh/williamwen42/357/orig
2025-12-04T09:33:41.8974862Z  * [new branch]              gh/williamwen42/358/base    -> origin/gh/williamwen42/358/base
2025-12-04T09:33:41.8976139Z  * [new branch]              gh/williamwen42/358/head    -> origin/gh/williamwen42/358/head
2025-12-04T09:33:41.8977536Z  * [new branch]              gh/williamwen42/358/orig    -> origin/gh/williamwen42/358/orig
2025-12-04T09:33:41.8979501Z  * [new branch]              gh/xmfan/169/base           -> origin/gh/xmfan/169/base
2025-12-04T09:33:41.8980799Z  * [new branch]              gh/xmfan/169/head           -> origin/gh/xmfan/169/head
2025-12-04T09:33:41.8982390Z  * [new branch]              gh/xmfan/170/base           -> origin/gh/xmfan/170/base
2025-12-04T09:33:41.8983539Z  * [new branch]              gh/xmfan/170/head           -> origin/gh/xmfan/170/head
2025-12-04T09:33:41.8985229Z  * [new branch]              gh/xmfan/274/base           -> origin/gh/xmfan/274/base
2025-12-04T09:33:41.8986467Z  * [new branch]              gh/xmfan/274/head           -> origin/gh/xmfan/274/head
2025-12-04T09:33:41.8987754Z  * [new branch]              gh/xmfan/274/orig           -> origin/gh/xmfan/274/orig
2025-12-04T09:33:41.8989385Z  * [new branch]              gh/xmfan/277/base           -> origin/gh/xmfan/277/base
2025-12-04T09:33:41.8990738Z  * [new branch]              gh/xmfan/277/head           -> origin/gh/xmfan/277/head
2025-12-04T09:33:41.8992046Z  * [new branch]              gh/xmfan/277/orig           -> origin/gh/xmfan/277/orig
2025-12-04T09:33:41.8994223Z  * [new branch]              gh/xmfan/301/base           -> origin/gh/xmfan/301/base
2025-12-04T09:33:41.8995373Z  * [new branch]              gh/xmfan/301/head           -> origin/gh/xmfan/301/head
2025-12-04T09:33:41.8996618Z  * [new branch]              gh/xmfan/301/orig           -> origin/gh/xmfan/301/orig
2025-12-04T09:33:41.8998709Z  * [new branch]              gh/xmfan/304/base           -> origin/gh/xmfan/304/base
2025-12-04T09:33:41.8999997Z  * [new branch]              gh/xmfan/304/head           -> origin/gh/xmfan/304/head
2025-12-04T09:33:41.9001288Z  * [new branch]              gh/xmfan/304/orig           -> origin/gh/xmfan/304/orig
2025-12-04T09:33:41.9003379Z  * [new branch]              gh/xmfan/309/base           -> origin/gh/xmfan/309/base
2025-12-04T09:33:41.9004508Z  * [new branch]              gh/xmfan/309/head           -> origin/gh/xmfan/309/head
2025-12-04T09:33:41.9006256Z  * [new branch]              gh/xmfan/309/orig           -> origin/gh/xmfan/309/orig
2025-12-04T09:33:41.9007944Z  * [new branch]              gh/xmfan/310/base           -> origin/gh/xmfan/310/base
2025-12-04T09:33:41.9009364Z  * [new branch]              gh/xmfan/310/head           -> origin/gh/xmfan/310/head
2025-12-04T09:33:41.9010590Z  * [new branch]              gh/xmfan/310/orig           -> origin/gh/xmfan/310/orig
2025-12-04T09:33:41.9012256Z  * [new branch]              gh/xmfan/311/base           -> origin/gh/xmfan/311/base
2025-12-04T09:33:41.9013487Z  * [new branch]              gh/xmfan/311/head           -> origin/gh/xmfan/311/head
2025-12-04T09:33:41.9014733Z  * [new branch]              gh/xmfan/311/orig           -> origin/gh/xmfan/311/orig
2025-12-04T09:33:41.9016510Z  * [new branch]              gh/xmfan/312/base           -> origin/gh/xmfan/312/base
2025-12-04T09:33:41.9017766Z  * [new branch]              gh/xmfan/312/head           -> origin/gh/xmfan/312/head
2025-12-04T09:33:41.9019079Z  * [new branch]              gh/xmfan/312/orig           -> origin/gh/xmfan/312/orig
2025-12-04T09:33:41.9020759Z  * [new branch]              gh/xmfan/313/base           -> origin/gh/xmfan/313/base
2025-12-04T09:33:41.9022038Z  * [new branch]              gh/xmfan/313/head           -> origin/gh/xmfan/313/head
2025-12-04T09:33:41.9023307Z  * [new branch]              gh/xmfan/313/orig           -> origin/gh/xmfan/313/orig
2025-12-04T09:33:41.9025471Z  * [new branch]              gh/xuanzhang816/27/base     -> origin/gh/xuanzhang816/27/base
2025-12-04T09:33:41.9026797Z  * [new branch]              gh/xuanzhang816/27/head     -> origin/gh/xuanzhang816/27/head
2025-12-04T09:33:41.9028015Z  * [new branch]              gh/xuanzhang816/27/orig     -> origin/gh/xuanzhang816/27/orig
2025-12-04T09:33:41.9029890Z  * [new branch]              gh/xuanzhang816/32/base     -> origin/gh/xuanzhang816/32/base
2025-12-04T09:33:41.9031484Z  * [new branch]              gh/xuanzhang816/32/head     -> origin/gh/xuanzhang816/32/head
2025-12-04T09:33:41.9032755Z  * [new branch]              gh/xuanzhang816/32/orig     -> origin/gh/xuanzhang816/32/orig
2025-12-04T09:33:41.9034500Z  * [new branch]              gh/xuanzhang816/33/base     -> origin/gh/xuanzhang816/33/base
2025-12-04T09:33:41.9035728Z  * [new branch]              gh/xuanzhang816/33/head     -> origin/gh/xuanzhang816/33/head
2025-12-04T09:33:41.9037051Z  * [new branch]              gh/xuanzhang816/33/orig     -> origin/gh/xuanzhang816/33/orig
2025-12-04T09:33:41.9039116Z  * [new branch]              gh/xuanzhang816/34/base     -> origin/gh/xuanzhang816/34/base
2025-12-04T09:33:41.9040432Z  * [new branch]              gh/xuanzhang816/34/head     -> origin/gh/xuanzhang816/34/head
2025-12-04T09:33:41.9041728Z  * [new branch]              gh/xuanzhang816/34/orig     -> origin/gh/xuanzhang816/34/orig
2025-12-04T09:33:41.9043937Z  * [new branch]              gh/xuanzhang816/35/base     -> origin/gh/xuanzhang816/35/base
2025-12-04T09:33:41.9045195Z  * [new branch]              gh/xuanzhang816/35/head     -> origin/gh/xuanzhang816/35/head
2025-12-04T09:33:41.9046550Z  * [new branch]              gh/xuanzhang816/35/orig     -> origin/gh/xuanzhang816/35/orig
2025-12-04T09:33:41.9048633Z  * [new branch]              gh/yanbing-j/11/base        -> origin/gh/yanbing-j/11/base
2025-12-04T09:33:41.9049955Z  * [new branch]              gh/yanbing-j/11/head        -> origin/gh/yanbing-j/11/head
2025-12-04T09:33:41.9051217Z  * [new branch]              gh/yanbing-j/11/orig        -> origin/gh/yanbing-j/11/orig
2025-12-04T09:33:41.9052921Z  * [new branch]              gh/yanbing-j/12/base        -> origin/gh/yanbing-j/12/base
2025-12-04T09:33:41.9054180Z  * [new branch]              gh/yanbing-j/12/head        -> origin/gh/yanbing-j/12/head
2025-12-04T09:33:41.9055468Z  * [new branch]              gh/yanbing-j/12/orig        -> origin/gh/yanbing-j/12/orig
2025-12-04T09:33:41.9057189Z  * [new branch]              gh/yanbing-j/13/base        -> origin/gh/yanbing-j/13/base
2025-12-04T09:33:41.9058496Z  * [new branch]              gh/yanbing-j/13/head        -> origin/gh/yanbing-j/13/head
2025-12-04T09:33:41.9059801Z  * [new branch]              gh/yanbing-j/13/orig        -> origin/gh/yanbing-j/13/orig
2025-12-04T09:33:41.9061599Z  * [new branch]              gh/yanbing-j/14/base        -> origin/gh/yanbing-j/14/base
2025-12-04T09:33:41.9062854Z  * [new branch]              gh/yanbing-j/14/head        -> origin/gh/yanbing-j/14/head
2025-12-04T09:33:41.9064142Z  * [new branch]              gh/yanbing-j/14/orig        -> origin/gh/yanbing-j/14/orig
2025-12-04T09:33:41.9065706Z  * [new branch]              gh/yanbing-j/15/base        -> origin/gh/yanbing-j/15/base
2025-12-04T09:33:41.9067001Z  * [new branch]              gh/yanbing-j/15/head        -> origin/gh/yanbing-j/15/head
2025-12-04T09:33:41.9068227Z  * [new branch]              gh/yanbing-j/15/orig        -> origin/gh/yanbing-j/15/orig
2025-12-04T09:33:41.9069820Z  * [new branch]              gh/yanbing-j/18/base        -> origin/gh/yanbing-j/18/base
2025-12-04T09:33:41.9071088Z  * [new branch]              gh/yanbing-j/18/head        -> origin/gh/yanbing-j/18/head
2025-12-04T09:33:41.9072395Z  * [new branch]              gh/yanbing-j/18/orig        -> origin/gh/yanbing-j/18/orig
2025-12-04T09:33:41.9074088Z  * [new branch]              gh/yanbing-j/19/base        -> origin/gh/yanbing-j/19/base
2025-12-04T09:33:41.9075390Z  * [new branch]              gh/yanbing-j/19/head        -> origin/gh/yanbing-j/19/head
2025-12-04T09:33:41.9076614Z  * [new branch]              gh/yanbing-j/19/orig        -> origin/gh/yanbing-j/19/orig
2025-12-04T09:33:41.9078423Z  * [new branch]              gh/yanbing-j/20/base        -> origin/gh/yanbing-j/20/base
2025-12-04T09:33:41.9079673Z  * [new branch]              gh/yanbing-j/20/head        -> origin/gh/yanbing-j/20/head
2025-12-04T09:33:41.9080913Z  * [new branch]              gh/yanbing-j/20/orig        -> origin/gh/yanbing-j/20/orig
2025-12-04T09:33:41.9082691Z  * [new branch]              gh/yanbing-j/21/base        -> origin/gh/yanbing-j/21/base
2025-12-04T09:33:41.9084128Z  * [new branch]              gh/yanbing-j/21/head        -> origin/gh/yanbing-j/21/head
2025-12-04T09:33:41.9085802Z  * [new branch]              gh/yanbing-j/22/base        -> origin/gh/yanbing-j/22/base
2025-12-04T09:33:41.9087028Z  * [new branch]              gh/yanbing-j/22/head        -> origin/gh/yanbing-j/22/head
2025-12-04T09:33:41.9088348Z  * [new branch]              gh/yanbing-j/22/orig        -> origin/gh/yanbing-j/22/orig
2025-12-04T09:33:41.9090091Z  * [new branch]              gh/yanbing-j/23/base        -> origin/gh/yanbing-j/23/base
2025-12-04T09:33:41.9091364Z  * [new branch]              gh/yanbing-j/23/head        -> origin/gh/yanbing-j/23/head
2025-12-04T09:33:41.9092655Z  * [new branch]              gh/yanbing-j/23/orig        -> origin/gh/yanbing-j/23/orig
2025-12-04T09:33:41.9094433Z  * [new branch]              gh/yanbing-j/24/base        -> origin/gh/yanbing-j/24/base
2025-12-04T09:33:41.9095734Z  * [new branch]              gh/yanbing-j/24/head        -> origin/gh/yanbing-j/24/head
2025-12-04T09:33:41.9097073Z  * [new branch]              gh/yanbing-j/24/orig        -> origin/gh/yanbing-j/24/orig
2025-12-04T09:33:41.9098778Z  * [new branch]              gh/yanbing-j/25/base        -> origin/gh/yanbing-j/25/base
2025-12-04T09:33:41.9100050Z  * [new branch]              gh/yanbing-j/25/head        -> origin/gh/yanbing-j/25/head
2025-12-04T09:33:41.9101331Z  * [new branch]              gh/yanbing-j/25/orig        -> origin/gh/yanbing-j/25/orig
2025-12-04T09:33:41.9103154Z  * [new branch]              gh/yanbing-j/26/base        -> origin/gh/yanbing-j/26/base
2025-12-04T09:33:41.9104384Z  * [new branch]              gh/yanbing-j/26/head        -> origin/gh/yanbing-j/26/head
2025-12-04T09:33:41.9105648Z  * [new branch]              gh/yanbing-j/26/orig        -> origin/gh/yanbing-j/26/orig
2025-12-04T09:33:41.9107850Z  * [new branch]              gh/yang-yu-hang/1/base      -> origin/gh/yang-yu-hang/1/base
2025-12-04T09:33:41.9109306Z  * [new branch]              gh/yang-yu-hang/1/head      -> origin/gh/yang-yu-hang/1/head
2025-12-04T09:33:41.9110781Z  * [new branch]              gh/yang-yu-hang/1/orig      -> origin/gh/yang-yu-hang/1/orig
2025-12-04T09:33:41.9112545Z  * [new branch]              gh/yang-yu-hang/2/base      -> origin/gh/yang-yu-hang/2/base
2025-12-04T09:33:41.9114142Z  * [new branch]              gh/yang-yu-hang/2/head      -> origin/gh/yang-yu-hang/2/head
2025-12-04T09:33:41.9115726Z  * [new branch]              gh/yang-yu-hang/2/orig      -> origin/gh/yang-yu-hang/2/orig
2025-12-04T09:33:41.9117447Z  * [new branch]              gh/yang-yu-hang/3/base      -> origin/gh/yang-yu-hang/3/base
2025-12-04T09:33:41.9118762Z  * [new branch]              gh/yang-yu-hang/3/head      -> origin/gh/yang-yu-hang/3/head
2025-12-04T09:33:41.9120089Z  * [new branch]              gh/yang-yu-hang/3/orig      -> origin/gh/yang-yu-hang/3/orig
2025-12-04T09:33:41.9122080Z  * [new branch]              gh/yangw-dev/12/base        -> origin/gh/yangw-dev/12/base
2025-12-04T09:33:41.9123512Z  * [new branch]              gh/yangw-dev/12/head        -> origin/gh/yangw-dev/12/head
2025-12-04T09:33:41.9124786Z  * [new branch]              gh/yangw-dev/12/orig        -> origin/gh/yangw-dev/12/orig
2025-12-04T09:33:41.9126465Z  * [new branch]              gh/yangw-dev/13/base        -> origin/gh/yangw-dev/13/base
2025-12-04T09:33:41.9127803Z  * [new branch]              gh/yangw-dev/13/head        -> origin/gh/yangw-dev/13/head
2025-12-04T09:33:41.9129160Z  * [new branch]              gh/yangw-dev/13/orig        -> origin/gh/yangw-dev/13/orig
2025-12-04T09:33:41.9130859Z  * [new branch]              gh/yangw-dev/14/base        -> origin/gh/yangw-dev/14/base
2025-12-04T09:33:41.9132121Z  * [new branch]              gh/yangw-dev/14/head        -> origin/gh/yangw-dev/14/head
2025-12-04T09:33:41.9133350Z  * [new branch]              gh/yangw-dev/14/orig        -> origin/gh/yangw-dev/14/orig
2025-12-04T09:33:41.9135038Z  * [new branch]              gh/yangw-dev/15/base        -> origin/gh/yangw-dev/15/base
2025-12-04T09:33:41.9136350Z  * [new branch]              gh/yangw-dev/15/head        -> origin/gh/yangw-dev/15/head
2025-12-04T09:33:41.9138006Z  * [new branch]              gh/yangw-dev/15/orig        -> origin/gh/yangw-dev/15/orig
2025-12-04T09:33:41.9139667Z  * [new branch]              gh/yangw-dev/19/base        -> origin/gh/yangw-dev/19/base
2025-12-04T09:33:41.9140935Z  * [new branch]              gh/yangw-dev/19/head        -> origin/gh/yangw-dev/19/head
2025-12-04T09:33:41.9142352Z  * [new branch]              gh/yangw-dev/19/orig        -> origin/gh/yangw-dev/19/orig
2025-12-04T09:33:41.9143997Z  * [new branch]              gh/yangw-dev/26/base        -> origin/gh/yangw-dev/26/base
2025-12-04T09:33:41.9145276Z  * [new branch]              gh/yangw-dev/26/head        -> origin/gh/yangw-dev/26/head
2025-12-04T09:33:41.9146582Z  * [new branch]              gh/yangw-dev/26/orig        -> origin/gh/yangw-dev/26/orig
2025-12-04T09:33:41.9148251Z  * [new branch]              gh/yangw-dev/27/base        -> origin/gh/yangw-dev/27/base
2025-12-04T09:33:41.9149674Z  * [new branch]              gh/yangw-dev/27/head        -> origin/gh/yangw-dev/27/head
2025-12-04T09:33:41.9150806Z  * [new branch]              gh/yangw-dev/27/orig        -> origin/gh/yangw-dev/27/orig
2025-12-04T09:33:41.9152890Z  * [new branch]              gh/ydwu4/292/base           -> origin/gh/ydwu4/292/base
2025-12-04T09:33:41.9154119Z  * [new branch]              gh/ydwu4/292/head           -> origin/gh/ydwu4/292/head
2025-12-04T09:33:41.9155308Z  * [new branch]              gh/ydwu4/292/orig           -> origin/gh/ydwu4/292/orig
2025-12-04T09:33:41.9157030Z  * [new branch]              gh/ydwu4/294/base           -> origin/gh/ydwu4/294/base
2025-12-04T09:33:41.9158274Z  * [new branch]              gh/ydwu4/294/head           -> origin/gh/ydwu4/294/head
2025-12-04T09:33:41.9159577Z  * [new branch]              gh/ydwu4/294/orig           -> origin/gh/ydwu4/294/orig
2025-12-04T09:33:41.9161516Z  * [new branch]              gh/ydwu4/295/base           -> origin/gh/ydwu4/295/base
2025-12-04T09:33:41.9163119Z  * [new branch]              gh/ydwu4/295/head           -> origin/gh/ydwu4/295/head
2025-12-04T09:33:41.9164372Z  * [new branch]              gh/ydwu4/295/orig           -> origin/gh/ydwu4/295/orig
2025-12-04T09:33:41.9165994Z  * [new branch]              gh/ydwu4/296/base           -> origin/gh/ydwu4/296/base
2025-12-04T09:33:41.9167145Z  * [new branch]              gh/ydwu4/296/head           -> origin/gh/ydwu4/296/head
2025-12-04T09:33:41.9168435Z  * [new branch]              gh/ydwu4/296/orig           -> origin/gh/ydwu4/296/orig
2025-12-04T09:33:41.9170207Z  * [new branch]              gh/ydwu4/306/base           -> origin/gh/ydwu4/306/base
2025-12-04T09:33:41.9171972Z  * [new branch]              gh/ydwu4/306/head           -> origin/gh/ydwu4/306/head
2025-12-04T09:33:41.9173360Z  * [new branch]              gh/ydwu4/306/orig           -> origin/gh/ydwu4/306/orig
2025-12-04T09:33:41.9175045Z  * [new branch]              gh/ydwu4/312/base           -> origin/gh/ydwu4/312/base
2025-12-04T09:33:41.9176317Z  * [new branch]              gh/ydwu4/312/head           -> origin/gh/ydwu4/312/head
2025-12-04T09:33:41.9177545Z  * [new branch]              gh/ydwu4/312/orig           -> origin/gh/ydwu4/312/orig
2025-12-04T09:33:41.9179236Z  * [new branch]              gh/ydwu4/322/base           -> origin/gh/ydwu4/322/base
2025-12-04T09:33:41.9180618Z  * [new branch]              gh/ydwu4/322/head           -> origin/gh/ydwu4/322/head
2025-12-04T09:33:41.9181869Z  * [new branch]              gh/ydwu4/322/orig           -> origin/gh/ydwu4/322/orig
2025-12-04T09:33:41.9183549Z  * [new branch]              gh/ydwu4/327/base           -> origin/gh/ydwu4/327/base
2025-12-04T09:33:41.9184905Z  * [new branch]              gh/ydwu4/327/head           -> origin/gh/ydwu4/327/head
2025-12-04T09:33:41.9186222Z  * [new branch]              gh/ydwu4/327/orig           -> origin/gh/ydwu4/327/orig
2025-12-04T09:33:41.9188009Z  * [new branch]              gh/ydwu4/328/base           -> origin/gh/ydwu4/328/base
2025-12-04T09:33:41.9189644Z  * [new branch]              gh/ydwu4/328/head           -> origin/gh/ydwu4/328/head
2025-12-04T09:33:41.9190899Z  * [new branch]              gh/ydwu4/328/orig           -> origin/gh/ydwu4/328/orig
2025-12-04T09:33:41.9192862Z  * [new branch]              gh/ydwu4/329/base           -> origin/gh/ydwu4/329/base
2025-12-04T09:33:41.9194160Z  * [new branch]              gh/ydwu4/329/head           -> origin/gh/ydwu4/329/head
2025-12-04T09:33:41.9195403Z  * [new branch]              gh/ydwu4/329/orig           -> origin/gh/ydwu4/329/orig
2025-12-04T09:33:41.9197235Z  * [new branch]              gh/ydwu4/330/base           -> origin/gh/ydwu4/330/base
2025-12-04T09:33:41.9198556Z  * [new branch]              gh/ydwu4/330/head           -> origin/gh/ydwu4/330/head
2025-12-04T09:33:41.9199789Z  * [new branch]              gh/ydwu4/330/orig           -> origin/gh/ydwu4/330/orig
2025-12-04T09:33:41.9201491Z  * [new branch]              gh/ydwu4/331/base           -> origin/gh/ydwu4/331/base
2025-12-04T09:33:41.9206088Z  * [new branch]              gh/ydwu4/331/head           -> origin/gh/ydwu4/331/head
2025-12-04T09:33:41.9207279Z  * [new branch]              gh/ydwu4/331/orig           -> origin/gh/ydwu4/331/orig
2025-12-04T09:33:41.9208776Z  * [new branch]              gh/ydwu4/332/base           -> origin/gh/ydwu4/332/base
2025-12-04T09:33:41.9210051Z  * [new branch]              gh/ydwu4/332/head           -> origin/gh/ydwu4/332/head
2025-12-04T09:33:41.9211329Z  * [new branch]              gh/ydwu4/332/orig           -> origin/gh/ydwu4/332/orig
2025-12-04T09:33:41.9212835Z  * [new branch]              gh/ydwu4/333/base           -> origin/gh/ydwu4/333/base
2025-12-04T09:33:41.9214612Z  * [new branch]              gh/ydwu4/333/head           -> origin/gh/ydwu4/333/head
2025-12-04T09:33:41.9215883Z  * [new branch]              gh/ydwu4/333/orig           -> origin/gh/ydwu4/333/orig
2025-12-04T09:33:41.9217421Z  * [new branch]              gh/ydwu4/334/base           -> origin/gh/ydwu4/334/base
2025-12-04T09:33:41.9218865Z  * [new branch]              gh/ydwu4/334/head           -> origin/gh/ydwu4/334/head
2025-12-04T09:33:41.9220120Z  * [new branch]              gh/ydwu4/334/orig           -> origin/gh/ydwu4/334/orig
2025-12-04T09:33:41.9221659Z  * [new branch]              gh/ydwu4/335/base           -> origin/gh/ydwu4/335/base
2025-12-04T09:33:41.9222891Z  * [new branch]              gh/ydwu4/335/head           -> origin/gh/ydwu4/335/head
2025-12-04T09:33:41.9224175Z  * [new branch]              gh/ydwu4/335/orig           -> origin/gh/ydwu4/335/orig
2025-12-04T09:33:41.9226259Z  * [new branch]              gh/ydwu4/337/base           -> origin/gh/ydwu4/337/base
2025-12-04T09:33:41.9227564Z  * [new branch]              gh/ydwu4/337/head           -> origin/gh/ydwu4/337/head
2025-12-04T09:33:41.9228827Z  * [new branch]              gh/ydwu4/337/orig           -> origin/gh/ydwu4/337/orig
2025-12-04T09:33:41.9230638Z  * [new branch]              gh/ydwu4/339/base           -> origin/gh/ydwu4/339/base
2025-12-04T09:33:41.9231990Z  * [new branch]              gh/ydwu4/339/head           -> origin/gh/ydwu4/339/head
2025-12-04T09:33:41.9233179Z  * [new branch]              gh/ydwu4/339/orig           -> origin/gh/ydwu4/339/orig
2025-12-04T09:33:41.9235467Z  * [new branch]              gh/yf225/133/base           -> origin/gh/yf225/133/base
2025-12-04T09:33:41.9236709Z  * [new branch]              gh/yf225/133/head           -> origin/gh/yf225/133/head
2025-12-04T09:33:41.9238428Z  * [new branch]              gh/yf225/93/base            -> origin/gh/yf225/93/base
2025-12-04T09:33:41.9239672Z  * [new branch]              gh/yf225/93/head            -> origin/gh/yf225/93/head
2025-12-04T09:33:41.9242425Z  * [new branch]              gh/yifuwang/152/base        -> origin/gh/yifuwang/152/base
2025-12-04T09:33:41.9244209Z  * [new branch]              gh/yifuwang/152/head        -> origin/gh/yifuwang/152/head
2025-12-04T09:33:41.9245556Z  * [new branch]              gh/yifuwang/152/orig        -> origin/gh/yifuwang/152/orig
2025-12-04T09:33:41.9247236Z  * [new branch]              gh/yifuwang/195/base        -> origin/gh/yifuwang/195/base
2025-12-04T09:33:41.9248545Z  * [new branch]              gh/yifuwang/195/head        -> origin/gh/yifuwang/195/head
2025-12-04T09:33:41.9249891Z  * [new branch]              gh/yifuwang/195/orig        -> origin/gh/yifuwang/195/orig
2025-12-04T09:33:41.9252138Z  * [new branch]              gh/yiming0416/1/base        -> origin/gh/yiming0416/1/base
2025-12-04T09:33:41.9253397Z  * [new branch]              gh/yiming0416/1/head        -> origin/gh/yiming0416/1/head
2025-12-04T09:33:41.9254943Z  * [new branch]              gh/yiming0416/2/base        -> origin/gh/yiming0416/2/base
2025-12-04T09:33:41.9256104Z  * [new branch]              gh/yiming0416/2/head        -> origin/gh/yiming0416/2/head
2025-12-04T09:33:41.9258167Z  * [new branch]              gh/yushangdi/1/base         -> origin/gh/yushangdi/1/base
2025-12-04T09:33:41.9259485Z  * [new branch]              gh/yushangdi/1/head         -> origin/gh/yushangdi/1/head
2025-12-04T09:33:41.9261337Z  * [new branch]              gh/yushangdi/10/base        -> origin/gh/yushangdi/10/base
2025-12-04T09:33:41.9262635Z  * [new branch]              gh/yushangdi/10/head        -> origin/gh/yushangdi/10/head
2025-12-04T09:33:41.9263966Z  * [new branch]              gh/yushangdi/10/orig        -> origin/gh/yushangdi/10/orig
2025-12-04T09:33:41.9265615Z  * [new branch]              gh/yushangdi/11/base        -> origin/gh/yushangdi/11/base
2025-12-04T09:33:41.9266864Z  * [new branch]              gh/yushangdi/11/head        -> origin/gh/yushangdi/11/head
2025-12-04T09:33:41.9268294Z  * [new branch]              gh/yushangdi/11/orig        -> origin/gh/yushangdi/11/orig
2025-12-04T09:33:41.9269816Z  * [new branch]              gh/yushangdi/2/base         -> origin/gh/yushangdi/2/base
2025-12-04T09:33:41.9271008Z  * [new branch]              gh/yushangdi/2/head         -> origin/gh/yushangdi/2/head
2025-12-04T09:33:41.9272796Z  * [new branch]              gh/yushangdi/7/base         -> origin/gh/yushangdi/7/base
2025-12-04T09:33:41.9274018Z  * [new branch]              gh/yushangdi/7/head         -> origin/gh/yushangdi/7/head
2025-12-04T09:33:41.9275307Z  * [new branch]              gh/yushangdi/7/orig         -> origin/gh/yushangdi/7/orig
2025-12-04T09:33:41.9277312Z  * [new branch]              gh/yushangdi/8/base         -> origin/gh/yushangdi/8/base
2025-12-04T09:33:41.9278758Z  * [new branch]              gh/yushangdi/8/head         -> origin/gh/yushangdi/8/head
2025-12-04T09:33:41.9280087Z  * [new branch]              gh/yushangdi/8/orig         -> origin/gh/yushangdi/8/orig
2025-12-04T09:33:41.9281641Z  * [new branch]              gh/yushangdi/9/base         -> origin/gh/yushangdi/9/base
2025-12-04T09:33:41.9283104Z  * [new branch]              gh/yushangdi/9/head         -> origin/gh/yushangdi/9/head
2025-12-04T09:33:41.9284385Z  * [new branch]              gh/yushangdi/9/orig         -> origin/gh/yushangdi/9/orig
2025-12-04T09:33:41.9286556Z  * [new branch]              gh/zklaus/19/base           -> origin/gh/zklaus/19/base
2025-12-04T09:33:41.9287828Z  * [new branch]              gh/zklaus/19/head           -> origin/gh/zklaus/19/head
2025-12-04T09:33:41.9289076Z  * [new branch]              gh/zklaus/19/orig           -> origin/gh/zklaus/19/orig
2025-12-04T09:33:41.9290790Z  * [new branch]              gh/zklaus/20/base           -> origin/gh/zklaus/20/base
2025-12-04T09:33:41.9292062Z  * [new branch]              gh/zklaus/20/head           -> origin/gh/zklaus/20/head
2025-12-04T09:33:41.9293352Z  * [new branch]              gh/zklaus/20/orig           -> origin/gh/zklaus/20/orig
2025-12-04T09:33:41.9295064Z  * [new branch]              gh/zklaus/21/base           -> origin/gh/zklaus/21/base
2025-12-04T09:33:41.9296353Z  * [new branch]              gh/zklaus/21/head           -> origin/gh/zklaus/21/head
2025-12-04T09:33:41.9297595Z  * [new branch]              gh/zklaus/21/orig           -> origin/gh/zklaus/21/orig
2025-12-04T09:33:41.9299223Z  * [new branch]              gh/zklaus/22/base           -> origin/gh/zklaus/22/base
2025-12-04T09:33:41.9300464Z  * [new branch]              gh/zklaus/22/head           -> origin/gh/zklaus/22/head
2025-12-04T09:33:41.9302004Z  * [new branch]              gh/zklaus/22/orig           -> origin/gh/zklaus/22/orig
2025-12-04T09:33:41.9303716Z  * [new branch]              gh/zklaus/23/base           -> origin/gh/zklaus/23/base
2025-12-04T09:33:41.9304971Z  * [new branch]              gh/zklaus/23/head           -> origin/gh/zklaus/23/head
2025-12-04T09:33:41.9306251Z  * [new branch]              gh/zklaus/23/orig           -> origin/gh/zklaus/23/orig
2025-12-04T09:33:41.9307801Z  * [new branch]              gh/zklaus/24/base           -> origin/gh/zklaus/24/base
2025-12-04T09:33:41.9309089Z  * [new branch]              gh/zklaus/24/head           -> origin/gh/zklaus/24/head
2025-12-04T09:33:41.9310339Z  * [new branch]              gh/zklaus/24/orig           -> origin/gh/zklaus/24/orig
2025-12-04T09:33:41.9312633Z  * [new branch]              gh/zou3519/1197/base        -> origin/gh/zou3519/1197/base
2025-12-04T09:33:41.9313841Z  * [new branch]              gh/zou3519/1197/head        -> origin/gh/zou3519/1197/head
2025-12-04T09:33:41.9315073Z  * [new branch]              gh/zou3519/1197/orig        -> origin/gh/zou3519/1197/orig
2025-12-04T09:33:41.9317158Z  * [new branch]              gh/zou3519/1199/base        -> origin/gh/zou3519/1199/base
2025-12-04T09:33:41.9318544Z  * [new branch]              gh/zou3519/1199/head        -> origin/gh/zou3519/1199/head
2025-12-04T09:33:41.9319831Z  * [new branch]              gh/zou3519/1199/orig        -> origin/gh/zou3519/1199/orig
2025-12-04T09:33:41.9321529Z  * [new branch]              gh/zou3519/1200/base        -> origin/gh/zou3519/1200/base
2025-12-04T09:33:41.9322901Z  * [new branch]              gh/zou3519/1200/head        -> origin/gh/zou3519/1200/head
2025-12-04T09:33:41.9324204Z  * [new branch]              gh/zou3519/1200/orig        -> origin/gh/zou3519/1200/orig
2025-12-04T09:33:41.9325922Z  * [new branch]              gh/zou3519/1201/base        -> origin/gh/zou3519/1201/base
2025-12-04T09:33:41.9327184Z  * [new branch]              gh/zou3519/1201/head        -> origin/gh/zou3519/1201/head
2025-12-04T09:33:41.9328412Z  * [new branch]              gh/zou3519/1201/orig        -> origin/gh/zou3519/1201/orig
2025-12-04T09:33:41.9329969Z  * [new branch]              gh/zou3519/1202/base        -> origin/gh/zou3519/1202/base
2025-12-04T09:33:41.9331215Z  * [new branch]              gh/zou3519/1202/head        -> origin/gh/zou3519/1202/head
2025-12-04T09:33:41.9332503Z  * [new branch]              gh/zou3519/1202/orig        -> origin/gh/zou3519/1202/orig
2025-12-04T09:33:41.9334690Z  * [new branch]              gh/zpcore/1/base            -> origin/gh/zpcore/1/base
2025-12-04T09:33:41.9335919Z  * [new branch]              gh/zpcore/1/head            -> origin/gh/zpcore/1/head
2025-12-04T09:33:41.9337699Z  * [new branch]              gh/zpcore/11/base           -> origin/gh/zpcore/11/base
2025-12-04T09:33:41.9339012Z  * [new branch]              gh/zpcore/11/head           -> origin/gh/zpcore/11/head
2025-12-04T09:33:41.9340234Z  * [new branch]              gh/zpcore/11/orig           -> origin/gh/zpcore/11/orig
2025-12-04T09:33:41.9342417Z  * [new branch]              gh/zpcore/12/base           -> origin/gh/zpcore/12/base
2025-12-04T09:33:41.9343724Z  * [new branch]              gh/zpcore/12/head           -> origin/gh/zpcore/12/head
2025-12-04T09:33:41.9345091Z  * [new branch]              gh/zpcore/12/orig           -> origin/gh/zpcore/12/orig
2025-12-04T09:33:41.9346879Z  * [new branch]              gh/zpcore/13/base           -> origin/gh/zpcore/13/base
2025-12-04T09:33:41.9348079Z  * [new branch]              gh/zpcore/13/head           -> origin/gh/zpcore/13/head
2025-12-04T09:33:41.9349316Z  * [new branch]              gh/zpcore/13/orig           -> origin/gh/zpcore/13/orig
2025-12-04T09:33:41.9351114Z  * [new branch]              gh/zpcore/14/base           -> origin/gh/zpcore/14/base
2025-12-04T09:33:41.9352568Z  * [new branch]              gh/zpcore/14/head           -> origin/gh/zpcore/14/head
2025-12-04T09:33:41.9353805Z  * [new branch]              gh/zpcore/14/orig           -> origin/gh/zpcore/14/orig
2025-12-04T09:33:41.9355780Z  * [new branch]              gh/zpcore/15/base           -> origin/gh/zpcore/15/base
2025-12-04T09:33:41.9357045Z  * [new branch]              gh/zpcore/15/head           -> origin/gh/zpcore/15/head
2025-12-04T09:33:41.9358322Z  * [new branch]              gh/zpcore/15/orig           -> origin/gh/zpcore/15/orig
2025-12-04T09:33:41.9360035Z  * [new branch]              gh/zpcore/2/base            -> origin/gh/zpcore/2/base
2025-12-04T09:33:41.9361364Z  * [new branch]              gh/zpcore/2/head            -> origin/gh/zpcore/2/head
2025-12-04T09:33:41.9363829Z  * [new branch]              gh/zpcore/21/base           -> origin/gh/zpcore/21/base
2025-12-04T09:33:41.9365246Z  * [new branch]              gh/zpcore/21/head           -> origin/gh/zpcore/21/head
2025-12-04T09:33:41.9366495Z  * [new branch]              gh/zpcore/21/orig           -> origin/gh/zpcore/21/orig
2025-12-04T09:33:41.9368897Z  * [new branch]              gh/zpcore/22/base           -> origin/gh/zpcore/22/base
2025-12-04T09:33:41.9370169Z  * [new branch]              gh/zpcore/22/head           -> origin/gh/zpcore/22/head
2025-12-04T09:33:41.9371598Z  * [new branch]              gh/zpcore/22/orig           -> origin/gh/zpcore/22/orig
2025-12-04T09:33:41.9373337Z  * [new branch]              gh/zpcore/23/base           -> origin/gh/zpcore/23/base
2025-12-04T09:33:41.9374659Z  * [new branch]              gh/zpcore/23/head           -> origin/gh/zpcore/23/head
2025-12-04T09:33:41.9375901Z  * [new branch]              gh/zpcore/23/orig           -> origin/gh/zpcore/23/orig
2025-12-04T09:33:41.9377476Z  * [new branch]              gh/zpcore/24/base           -> origin/gh/zpcore/24/base
2025-12-04T09:33:41.9378787Z  * [new branch]              gh/zpcore/24/head           -> origin/gh/zpcore/24/head
2025-12-04T09:33:41.9380103Z  * [new branch]              gh/zpcore/24/orig           -> origin/gh/zpcore/24/orig
2025-12-04T09:33:41.9382005Z  * [new branch]              gh/zpcore/25/base           -> origin/gh/zpcore/25/base
2025-12-04T09:33:41.9383231Z  * [new branch]              gh/zpcore/25/head           -> origin/gh/zpcore/25/head
2025-12-04T09:33:41.9384511Z  * [new branch]              gh/zpcore/25/orig           -> origin/gh/zpcore/25/orig
2025-12-04T09:33:41.9386264Z  * [new branch]              gh/zpcore/26/base           -> origin/gh/zpcore/26/base
2025-12-04T09:33:41.9387662Z  * [new branch]              gh/zpcore/26/head           -> origin/gh/zpcore/26/head
2025-12-04T09:33:41.9389036Z  * [new branch]              gh/zpcore/26/orig           -> origin/gh/zpcore/26/orig
2025-12-04T09:33:41.9390892Z  * [new branch]              gh/zpcore/27/base           -> origin/gh/zpcore/27/base
2025-12-04T09:33:41.9392155Z  * [new branch]              gh/zpcore/27/head           -> origin/gh/zpcore/27/head
2025-12-04T09:33:41.9393376Z  * [new branch]              gh/zpcore/27/orig           -> origin/gh/zpcore/27/orig
2025-12-04T09:33:41.9395701Z  * [new branch]              gh/zpcore/28/base           -> origin/gh/zpcore/28/base
2025-12-04T09:33:41.9397535Z  * [new branch]              gh/zpcore/28/head           -> origin/gh/zpcore/28/head
2025-12-04T09:33:41.9399279Z  * [new branch]              gh/zpcore/28/orig           -> origin/gh/zpcore/28/orig
2025-12-04T09:33:41.9400967Z  * [new branch]              gh/zpcore/3/base            -> origin/gh/zpcore/3/base
2025-12-04T09:33:41.9402800Z  * [new branch]              gh/zpcore/3/head            -> origin/gh/zpcore/3/head
2025-12-04T09:33:41.9404358Z  * [new branch]              gh/zpcore/4/base            -> origin/gh/zpcore/4/base
2025-12-04T09:33:41.9405608Z  * [new branch]              gh/zpcore/4/head            -> origin/gh/zpcore/4/head
2025-12-04T09:33:41.9407168Z  * [new branch]              gh/zpcore/5/base            -> origin/gh/zpcore/5/base
2025-12-04T09:33:41.9408429Z  * [new branch]              gh/zpcore/5/head            -> origin/gh/zpcore/5/head
2025-12-04T09:33:41.9409938Z  * [new branch]              gh/zpcore/6/base            -> origin/gh/zpcore/6/base
2025-12-04T09:33:41.9411176Z  * [new branch]              gh/zpcore/6/head            -> origin/gh/zpcore/6/head
2025-12-04T09:33:41.9413112Z  * [new branch]              gh/zpcore/7/base            -> origin/gh/zpcore/7/base
2025-12-04T09:33:41.9414301Z  * [new branch]              gh/zpcore/7/head            -> origin/gh/zpcore/7/head
2025-12-04T09:33:41.9415920Z  * [new branch]              gh/zpcore/8/base            -> origin/gh/zpcore/8/base
2025-12-04T09:33:41.9417213Z  * [new branch]              gh/zpcore/8/head            -> origin/gh/zpcore/8/head
2025-12-04T09:33:41.9418712Z  * [new branch]              google-main                 -> origin/google-main
2025-12-04T09:33:41.9420580Z  * [new branch]              guangyey/external_stream    -> origin/guangyey/external_stream
2025-12-04T09:33:41.9421729Z  * [new branch]              guangyey/test_2025          -> origin/guangyey/test_2025
2025-12-04T09:33:41.9423844Z  * [new branch]              guilhermeleobas/cherry-pick-55d87d9dfd9 -> origin/guilhermeleobas/cherry-pick-55d87d9dfd9
2025-12-04T09:33:41.9425436Z  * [new branch]              hameerabbasi/complex_tensor_subclass -> origin/hameerabbasi/complex_tensor_subclass
2025-12-04T09:33:41.9427337Z  * [new branch]              hameerabbasi/fix-ctensor-gradcheck-tests -> origin/hameerabbasi/fix-ctensor-gradcheck-tests
2025-12-04T09:33:41.9428484Z  * [new branch]              hameerabbasi/gradcheck-allclose -> origin/hameerabbasi/gradcheck-allclose
2025-12-04T09:33:41.9429711Z  * [new branch]              hc_baseline                 -> origin/hc_baseline
2025-12-04T09:33:41.9431587Z  * [new branch]              hhh_rand                    -> origin/hhh_rand
2025-12-04T09:33:41.9433295Z  * [new branch]              huba/f1                     -> origin/huba/f1
2025-12-04T09:33:41.9435428Z  * [new branch]              increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test -> origin/increase-timeout-linux-jammy-cuda12_8-py3_10-gcc11-test
2025-12-04T09:33:41.9436248Z  * [new branch]              inlining                    -> origin/inlining
2025-12-04T09:33:41.9437732Z  * [new branch]              inlining-ezyang             -> origin/inlining-ezyang
2025-12-04T09:33:41.9439063Z  * [new branch]              install-torchao-0.13.0      -> origin/install-torchao-0.13.0
2025-12-04T09:33:41.9440707Z  * [new branch]              instrument-trunk-pull-linux-with-job-test-filters -> origin/instrument-trunk-pull-linux-with-job-test-filters
2025-12-04T09:33:41.9442092Z  * [new branch]              invoke-subgraph             -> origin/invoke-subgraph
2025-12-04T09:33:41.9443667Z  * [new branch]              issue#58739                 -> origin/issue#58739
2025-12-04T09:33:41.9445214Z  * [new branch]              jainapurva-patch-1          -> origin/jainapurva-patch-1
2025-12-04T09:33:41.9446796Z  * [new branch]              jathu/o3                    -> origin/jathu/o3
2025-12-04T09:33:41.9448022Z  * [new branch]              jathu/sve                   -> origin/jathu/sve
2025-12-04T09:33:41.9450032Z  * [new branch]              jcaip/test-cusparselt-version-0.6.2 -> origin/jcaip/test-cusparselt-version-0.6.2
2025-12-04T09:33:41.9451277Z  * [new branch]              jcaip/update-cusparselt-0.6.2 -> origin/jcaip/update-cusparselt-0.6.2
2025-12-04T09:33:41.9452968Z  * [new branch]              jiannanWang/memorysnapshot_filter -> origin/jiannanWang/memorysnapshot_filter
2025-12-04T09:33:41.9454255Z  * [new branch]              jiannanWang/profilerstepwarning -> origin/jiannanWang/profilerstepwarning
2025-12-04T09:33:41.9455592Z  * [new branch]              jithunnair-amd-patch-1      -> origin/jithunnair-amd-patch-1
2025-12-04T09:33:41.9457004Z  * [new branch]              jithunnair-amd-patch-10     -> origin/jithunnair-amd-patch-10
2025-12-04T09:33:41.9458396Z  * [new branch]              jithunnair-amd-patch-2      -> origin/jithunnair-amd-patch-2
2025-12-04T09:33:41.9459738Z  * [new branch]              jithunnair-amd-patch-3      -> origin/jithunnair-amd-patch-3
2025-12-04T09:33:41.9461136Z  * [new branch]              jithunnair-amd-patch-4      -> origin/jithunnair-amd-patch-4
2025-12-04T09:33:41.9462436Z  * [new branch]              jithunnair-amd-patch-5      -> origin/jithunnair-amd-patch-5
2025-12-04T09:33:41.9463969Z  * [new branch]              jithunnair-amd-patch-6      -> origin/jithunnair-amd-patch-6
2025-12-04T09:33:41.9465216Z  * [new branch]              jithunnair-amd-patch-7      -> origin/jithunnair-amd-patch-7
2025-12-04T09:33:41.9466649Z  * [new branch]              jithunnair-amd-patch-8      -> origin/jithunnair-amd-patch-8
2025-12-04T09:33:41.9468057Z  * [new branch]              jithunnair-amd-patch-9      -> origin/jithunnair-amd-patch-9
2025-12-04T09:33:41.9469908Z  * [new branch]              justinchu/native-qdq        -> origin/justinchu/native-qdq
2025-12-04T09:33:41.9471647Z  * [new branch]              kainan666/xlf_debug         -> origin/kainan666/xlf_debug
2025-12-04T09:33:41.9472880Z  * [new branch]              kainan_test                 -> origin/kainan_test
2025-12-04T09:33:41.9474228Z  * [new branch]              larryliu0820-patch-1        -> origin/larryliu0820-patch-1
2025-12-04T09:33:41.9476044Z  * [new branch]              leslie/test_group_gemm_epilogues -> origin/leslie/test_group_gemm_epilogues
2025-12-04T09:33:41.9477783Z  * [new branch]              lessw2020/fix_cutlass_cache_error -> origin/lessw2020/fix_cutlass_cache_error
2025-12-04T09:33:41.9479410Z  * [new branch]              liaoxuan/shm_all_reduce     -> origin/liaoxuan/shm_all_reduce
2025-12-04T09:33:41.9480873Z  * [new branch]              liaoxuan/test_fa_disable_softmax -> origin/liaoxuan/test_fa_disable_softmax
2025-12-04T09:33:41.9482081Z  * [new branch]              liaoxuan/test_int8_sdpa     -> origin/liaoxuan/test_int8_sdpa
2025-12-04T09:33:41.9483483Z  * [new branch]              llama4-stable               -> origin/llama4-stable
2025-12-04T09:33:41.9485691Z  * [new branch]              lts/release/1.8             -> origin/lts/release/1.8
2025-12-04T09:33:41.9487578Z  * [new branch]              lucaskabela/#94773          -> origin/lucaskabela/#94773
2025-12-04T09:33:41.9488752Z  * [new branch]              lucaskabela/fix_164876      -> origin/lucaskabela/fix_164876
2025-12-04T09:33:41.9490005Z  * [new branch]              lucaskabela/flop_counter    -> origin/lucaskabela/flop_counter
2025-12-04T09:33:41.9491529Z  * [new branch]              lucaskabela/func_under_decomp -> origin/lucaskabela/func_under_decomp
2025-12-04T09:33:41.9492771Z  * [new branch]              lucaskabela/functional_in_dynamo -> origin/lucaskabela/functional_in_dynamo
2025-12-04T09:33:41.9494121Z  * [new branch]              lucaskabela/install_params_as_graph_attr -> origin/lucaskabela/install_params_as_graph_attr
2025-12-04T09:33:41.9495620Z  * [new branch]              lucaskabela/parameters_as_graph_attr -> origin/lucaskabela/parameters_as_graph_attr
2025-12-04T09:33:41.9497330Z  * [new branch]              lucaskabela/remove_aot_dispatcher_metadata -> origin/lucaskabela/remove_aot_dispatcher_metadata
2025-12-04T09:33:41.9498460Z  * [new branch]              lucaskabela/rnn_decomp      -> origin/lucaskabela/rnn_decomp
2025-12-04T09:33:41.9499868Z  * [new branch]              lucaskabela/typing_backends -> origin/lucaskabela/typing_backends
2025-12-04T09:33:41.9501315Z  * [new branch]              lucaskabela/typing_ctx_manager -> origin/lucaskabela/typing_ctx_manager
2025-12-04T09:33:41.9502736Z  * [new branch]              lucaskabela/typing_nn_module -> origin/lucaskabela/typing_nn_module
2025-12-04T09:33:41.9503993Z  * [new branch]              lucaskabela/typing_user_defined -> origin/lucaskabela/typing_user_defined
2025-12-04T09:33:41.9505264Z  * [new branch]              lucaskabela/typing_variables -> origin/lucaskabela/typing_variables
2025-12-04T09:33:41.9506573Z  * [new branch]              lucaskabela/typing_variables_dicts -> origin/lucaskabela/typing_variables_dicts
2025-12-04T09:33:41.9507937Z  * [new branch]              lucaskabela/typing_variables_functions -> origin/lucaskabela/typing_variables_functions
2025-12-04T09:33:41.9509136Z  * [new branch]              lucaskabela/typing_variables_lists -> origin/lucaskabela/typing_variables_lists
2025-12-04T09:33:41.9510828Z  * [new branch]              lw/torch_box_by_ref         -> origin/lw/torch_box_by_ref
2025-12-04T09:33:41.9512208Z  * [new branch]              main                        -> origin/main
2025-12-04T09:33:41.9513720Z  * [new branch]              malfet-patch-1              -> origin/malfet-patch-1
2025-12-04T09:33:41.9515256Z  * [new branch]              malfet-patch-2              -> origin/malfet-patch-2
2025-12-04T09:33:41.9516656Z  * [new branch]              malfet-patch-3              -> origin/malfet-patch-3
2025-12-04T09:33:41.9518141Z  * [new branch]              malfet-patch-4              -> origin/malfet-patch-4
2025-12-04T09:33:41.9519518Z  * [new branch]              malfet-patch-5              -> origin/malfet-patch-5
2025-12-04T09:33:41.9520977Z  * [new branch]              malfet-patch-6              -> origin/malfet-patch-6
2025-12-04T09:33:41.9522478Z  * [new branch]              malfet-patch-7              -> origin/malfet-patch-7
2025-12-04T09:33:41.9523937Z  * [new branch]              malfet-patch-8              -> origin/malfet-patch-8
2025-12-04T09:33:41.9525677Z  * [new branch]              malfet/add-3.14-ci          -> origin/malfet/add-3.14-ci
2025-12-04T09:33:41.9527757Z  * [new branch]              malfet/be-do-not-make-typos-in-build-artifacts -> origin/malfet/be-do-not-make-typos-in-build-artifacts
2025-12-04T09:33:41.9528954Z  * [new branch]              malfet/be-move-more-settings-to-checkout-pytorch -> origin/malfet/be-move-more-settings-to-checkout-pytorch
2025-12-04T09:33:41.9530448Z  * [new branch]              malfet/be-remove-misisng-neon-headers -> origin/malfet/be-remove-misisng-neon-headers
2025-12-04T09:33:41.9532254Z  * [new branch]              malfet/mps-implement-col2im -> origin/malfet/mps-implement-col2im
2025-12-04T09:33:41.9534141Z  * [new branch]              manuel/aoti_metal_shimify-thread_safe -> origin/manuel/aoti_metal_shimify-thread_safe
2025-12-04T09:33:41.9535144Z  * [new branch]              manuel/inductor_link_openmp -> origin/manuel/inductor_link_openmp
2025-12-04T09:33:41.9537363Z  * [new branch]              masnesral/metaconda         -> origin/masnesral/metaconda
2025-12-04T09:33:41.9538828Z  * [new branch]              mem_profiler_flaky_fix      -> origin/mem_profiler_flaky_fix
2025-12-04T09:33:41.9540175Z  * [new branch]              mem_profiler_stack_trace    -> origin/mem_profiler_stack_trace
2025-12-04T09:33:41.9541719Z  * [new branch]              memory_profiler_stack       -> origin/memory_profiler_stack
2025-12-04T09:33:41.9543075Z  * [new branch]              metascroy-patch-1           -> origin/metascroy-patch-1
2025-12-04T09:33:41.9544387Z  * [new branch]              mingw_posix                 -> origin/mingw_posix
2025-12-04T09:33:41.9546236Z  * [new branch]              mlazos/S429861-debug        -> origin/mlazos/S429861-debug
2025-12-04T09:33:41.9547468Z  * [new branch]              mlazos/aa                   -> origin/mlazos/aa
2025-12-04T09:33:41.9548749Z  * [new branch]              mlazos/acts                 -> origin/mlazos/acts
2025-12-04T09:33:41.9549972Z  * [new branch]              mlazos/arg-renames          -> origin/mlazos/arg-renames
2025-12-04T09:33:41.9551252Z  * [new branch]              mlazos/bad-cudagraphs       -> origin/mlazos/bad-cudagraphs
2025-12-04T09:33:41.9552551Z  * [new branch]              mlazos/baseline-graph-breaks -> origin/mlazos/baseline-graph-breaks
2025-12-04T09:33:41.9553719Z  * [new branch]              mlazos/beta-tensor          -> origin/mlazos/beta-tensor
2025-12-04T09:33:41.9554912Z  * [new branch]              mlazos/buffers              -> origin/mlazos/buffers
2025-12-04T09:33:41.9555993Z  * [new branch]              mlazos/buffers2             -> origin/mlazos/buffers2
2025-12-04T09:33:41.9557527Z  * [new branch]              mlazos/buffers3             -> origin/mlazos/buffers3
2025-12-04T09:33:41.9559001Z  * [new branch]              mlazos/bwd                  -> origin/mlazos/bwd
2025-12-04T09:33:41.9560253Z  * [new branch]              mlazos/combo-test           -> origin/mlazos/combo-test
2025-12-04T09:33:41.9561694Z  * [new branch]              mlazos/ctx-cleanup          -> origin/mlazos/ctx-cleanup
2025-12-04T09:33:41.9563141Z  * [new branch]              mlazos/cuda-cmd-log         -> origin/mlazos/cuda-cmd-log
2025-12-04T09:33:41.9564569Z  * [new branch]              mlazos/cudagraph-tests      -> origin/mlazos/cudagraph-tests
2025-12-04T09:33:41.9565912Z  * [new branch]              mlazos/cudagraphs-measurement -> origin/mlazos/cudagraphs-measurement
2025-12-04T09:33:41.9567276Z  * [new branch]              mlazos/cutlass-test         -> origin/mlazos/cutlass-test
2025-12-04T09:33:41.9568634Z  * [new branch]              mlazos/cutlass-topo-bug     -> origin/mlazos/cutlass-topo-bug
2025-12-04T09:33:41.9569868Z  * [new branch]              mlazos/dataclass-proxy      -> origin/mlazos/dataclass-proxy
2025-12-04T09:33:41.9571098Z  * [new branch]              mlazos/dc-attrs             -> origin/mlazos/dc-attrs
2025-12-04T09:33:41.9572376Z  * [new branch]              mlazos/dc-helion            -> origin/mlazos/dc-helion
2025-12-04T09:33:41.9573656Z  * [new branch]              mlazos/dict-fix             -> origin/mlazos/dict-fix
2025-12-04T09:33:41.9574911Z  * [new branch]              mlazos/disable-tf           -> origin/mlazos/disable-tf
2025-12-04T09:33:41.9576195Z  * [new branch]              mlazos/dupe-fix             -> origin/mlazos/dupe-fix
2025-12-04T09:33:41.9577552Z  * [new branch]              mlazos/dyn-batch            -> origin/mlazos/dyn-batch
2025-12-04T09:33:41.9578812Z  * [new branch]              mlazos/evt                  -> origin/mlazos/evt
2025-12-04T09:33:41.9580130Z  * [new branch]              mlazos/extract-examples     -> origin/mlazos/extract-examples
2025-12-04T09:33:41.9581371Z  * [new branch]              mlazos/foreach-op           -> origin/mlazos/foreach-op
2025-12-04T09:33:41.9582683Z  * [new branch]              mlazos/fp8                  -> origin/mlazos/fp8
2025-12-04T09:33:41.9583967Z  * [new branch]              mlazos/fp8-bias             -> origin/mlazos/fp8-bias
2025-12-04T09:33:41.9585312Z  * [new branch]              mlazos/fp8-bias-fusion      -> origin/mlazos/fp8-bias-fusion
2025-12-04T09:33:41.9586577Z  * [new branch]              mlazos/fp8-fixes            -> origin/mlazos/fp8-fixes
2025-12-04T09:33:41.9587859Z  * [new branch]              mlazos/freezing             -> origin/mlazos/freezing
2025-12-04T09:33:41.9589131Z  * [new branch]              mlazos/h-comp               -> origin/mlazos/h-comp
2025-12-04T09:33:41.9590483Z  * [new branch]              mlazos/h-comp2              -> origin/mlazos/h-comp2
2025-12-04T09:33:41.9592337Z  * [new branch]              mlazos/hash-hop             -> origin/mlazos/hash-hop
2025-12-04T09:33:41.9593617Z  * [new branch]              mlazos/hc                   -> origin/mlazos/hc
2025-12-04T09:33:41.9595010Z  * [new branch]              mlazos/hc-cycles            -> origin/mlazos/hc-cycles
2025-12-04T09:33:41.9596302Z  * [new branch]              mlazos/hc-fixes             -> origin/mlazos/hc-fixes
2025-12-04T09:33:41.9597607Z  * [new branch]              mlazos/hc-fixes3            -> origin/mlazos/hc-fixes3
2025-12-04T09:33:41.9598881Z  * [new branch]              mlazos/hc-fixes4            -> origin/mlazos/hc-fixes4
2025-12-04T09:33:41.9600283Z  * [new branch]              mlazos/hc-hf                -> origin/mlazos/hc-hf
2025-12-04T09:33:41.9604578Z  * [new branch]              mlazos/hc-mut               -> origin/mlazos/hc-mut
2025-12-04T09:33:41.9606064Z  * [new branch]              mlazos/hc10                 -> origin/mlazos/hc10
2025-12-04T09:33:41.9607428Z  * [new branch]              mlazos/hc11                 -> origin/mlazos/hc11
2025-12-04T09:33:41.9608668Z  * [new branch]              mlazos/hc12                 -> origin/mlazos/hc12
2025-12-04T09:33:41.9609934Z  * [new branch]              mlazos/hc13                 -> origin/mlazos/hc13
2025-12-04T09:33:41.9611238Z  * [new branch]              mlazos/hc14                 -> origin/mlazos/hc14
2025-12-04T09:33:41.9612436Z  * [new branch]              mlazos/hc15                 -> origin/mlazos/hc15
2025-12-04T09:33:41.9613715Z  * [new branch]              mlazos/hc2                  -> origin/mlazos/hc2
2025-12-04T09:33:41.9615478Z  * [new branch]              mlazos/hc4                  -> origin/mlazos/hc4
2025-12-04T09:33:41.9616767Z  * [new branch]              mlazos/hc5                  -> origin/mlazos/hc5
2025-12-04T09:33:41.9618005Z  * [new branch]              mlazos/hc6                  -> origin/mlazos/hc6
2025-12-04T09:33:41.9619332Z  * [new branch]              mlazos/hc7                  -> origin/mlazos/hc7
2025-12-04T09:33:41.9620622Z  * [new branch]              mlazos/hc8                  -> origin/mlazos/hc8
2025-12-04T09:33:41.9621772Z  * [new branch]              mlazos/hc9                  -> origin/mlazos/hc9
2025-12-04T09:33:41.9623138Z  * [new branch]              mlazos/hc_baseline2         -> origin/mlazos/hc_baseline2
2025-12-04T09:33:41.9624321Z  * [new branch]              mlazos/inductor-streams     -> origin/mlazos/inductor-streams
2025-12-04T09:33:41.9625392Z  * [new branch]              mlazos/main                 -> origin/mlazos/main
2025-12-04T09:33:41.9626661Z  * [new branch]              mlazos/mcg2                 -> origin/mlazos/mcg2
2025-12-04T09:33:41.9628127Z  * [new branch]              mlazos/meta-guards          -> origin/mlazos/meta-guards
2025-12-04T09:33:41.9630021Z  * [new branch]              mlazos/mlazos/foreach-map-adam -> origin/mlazos/mlazos/foreach-map-adam
2025-12-04T09:33:41.9631330Z  * [new branch]              mlazos/mlazos/tf-mode-backup -> origin/mlazos/mlazos/tf-mode-backup
2025-12-04T09:33:41.9632566Z  * [new branch]              mlazos/mod-fix              -> origin/mlazos/mod-fix
2025-12-04T09:33:41.9633895Z  * [new branch]              mlazos/mode-fix             -> origin/mlazos/mode-fix
2025-12-04T09:33:41.9635149Z  * [new branch]              mlazos/offsets              -> origin/mlazos/offsets
2025-12-04T09:33:41.9636336Z  * [new branch]              mlazos/overguarding         -> origin/mlazos/overguarding
2025-12-04T09:33:41.9637615Z  * [new branch]              mlazos/proxy-ctors          -> origin/mlazos/proxy-ctors
2025-12-04T09:33:41.9638904Z  * [new branch]              mlazos/quant-fix            -> origin/mlazos/quant-fix
2025-12-04T09:33:41.9640152Z  * [new branch]              mlazos/resnet-fix           -> origin/mlazos/resnet-fix
2025-12-04T09:33:41.9641478Z  * [new branch]              mlazos/rm-buf-names         -> origin/mlazos/rm-buf-names
2025-12-04T09:33:41.9642901Z  * [new branch]              mlazos/rm-code              -> origin/mlazos/rm-code
2025-12-04T09:33:41.9644192Z  * [new branch]              mlazos/rm-spam              -> origin/mlazos/rm-spam
2025-12-04T09:33:41.9645530Z  * [new branch]              mlazos/rtp                  -> origin/mlazos/rtp
2025-12-04T09:33:41.9646894Z  * [new branch]              mlazos/static-idx-dbg       -> origin/mlazos/static-idx-dbg
2025-12-04T09:33:41.9648313Z  * [new branch]              mlazos/static-inputs-log    -> origin/mlazos/static-inputs-log
2025-12-04T09:33:41.9649354Z  * [new branch]              mlazos/stests               -> origin/mlazos/stests
2025-12-04T09:33:41.9650670Z  * [new branch]              mlazos/stream-ops           -> origin/mlazos/stream-ops
2025-12-04T09:33:41.9651890Z  * [new branch]              mlazos/td-fix2              -> origin/mlazos/td-fix2
2025-12-04T09:33:41.9653201Z  * [new branch]              mlazos/tensor-hasattr2      -> origin/mlazos/tensor-hasattr2
2025-12-04T09:33:41.9654469Z  * [new branch]              mlazos/test                 -> origin/mlazos/test
2025-12-04T09:33:41.9655777Z  * [new branch]              mlazos/tf-mode              -> origin/mlazos/tf-mode
2025-12-04T09:33:41.9657101Z  * [new branch]              mlazos/tf-mode-backup2      -> origin/mlazos/tf-mode-backup2
2025-12-04T09:33:41.9658351Z  * [new branch]              mlazos/tf-mode-reland       -> origin/mlazos/tf-mode-reland
2025-12-04T09:33:41.9659735Z  * [new branch]              mlazos/tf-mode-reland2      -> origin/mlazos/tf-mode-reland2
2025-12-04T09:33:41.9660983Z  * [new branch]              mlazos/tf-mode-reland3      -> origin/mlazos/tf-mode-reland3
2025-12-04T09:33:41.9662258Z  * [new branch]              mlazos/triton-no-epi        -> origin/mlazos/triton-no-epi
2025-12-04T09:33:41.9663578Z  * [new branch]              mlazos/tune-proto           -> origin/mlazos/tune-proto
2025-12-04T09:33:41.9664821Z  * [new branch]              mlazos/tuple-fixes          -> origin/mlazos/tuple-fixes
2025-12-04T09:33:41.9666212Z  * [new branch]              mlazos/tuple-fixes2         -> origin/mlazos/tuple-fixes2
2025-12-04T09:33:41.9667521Z  * [new branch]              mlazos/tuple-handling       -> origin/mlazos/tuple-handling
2025-12-04T09:33:41.9668930Z  * [new branch]              mlazos/user-stream-base     -> origin/mlazos/user-stream-base
2025-12-04T09:33:41.9670184Z  * [new branch]              mlazos/user-streams         -> origin/mlazos/user-streams
2025-12-04T09:33:41.9671517Z  * [new branch]              mlazos/user-streams-backup  -> origin/mlazos/user-streams-backup
2025-12-04T09:33:41.9672879Z  * [new branch]              mlazos/user-streams-backup2 -> origin/mlazos/user-streams-backup2
2025-12-04T09:33:41.9674018Z  * [new branch]              mlazos/vary-beta            -> origin/mlazos/vary-beta
2025-12-04T09:33:41.9675335Z  * [new branch]              mlazos/vary-beta2           -> origin/mlazos/vary-beta2
2025-12-04T09:33:41.9676585Z  * [new branch]              mlazos/weird-perf1          -> origin/mlazos/weird-perf1
2025-12-04T09:33:41.9678042Z  * [new branch]              mm_out_dtype_compile        -> origin/mm_out_dtype_compile
2025-12-04T09:33:41.9679351Z  * [new branch]              module-shim                 -> origin/module-shim
2025-12-04T09:33:41.9680729Z  * [new branch]              move_config                 -> origin/move_config
2025-12-04T09:33:41.9682671Z  * [new branch]              msaroufim/reduce            -> origin/msaroufim/reduce
2025-12-04T09:33:41.9684488Z  * [new branch]              mtia/basic-cmake            -> origin/mtia/basic-cmake
2025-12-04T09:33:41.9686299Z  * [new branch]              mwizak/fix-triton-block-shape -> origin/mwizak/fix-triton-block-shape
2025-12-04T09:33:41.9687611Z  * [new branch]              my_varlen_backup            -> origin/my_varlen_backup
2025-12-04T09:33:41.9688995Z  * [new branch]              nativert_num_outputs        -> origin/nativert_num_outputs
2025-12-04T09:33:41.9690320Z  * [new branch]              new-codegen                 -> origin/new-codegen
2025-12-04T09:33:41.9691677Z  * [new branch]              newtest-base                -> origin/newtest-base
2025-12-04T09:33:41.9693441Z  * [new branch]              ngimel/addmm_dtype          -> origin/ngimel/addmm_dtype
2025-12-04T09:33:41.9694630Z  * [new branch]              ngimel/div_inv              -> origin/ngimel/div_inv
2025-12-04T09:33:41.9695934Z  * [new branch]              ngimel/error_index_list     -> origin/ngimel/error_index_list
2025-12-04T09:33:41.9697121Z  * [new branch]              ngimel/gather_grid          -> origin/ngimel/gather_grid
2025-12-04T09:33:41.9698385Z  * [new branch]              ngimel/gather_grid_release  -> origin/ngimel/gather_grid_release
2025-12-04T09:33:41.9699494Z  * [new branch]              ngimel/gg_new               -> origin/ngimel/gg_new
2025-12-04T09:33:41.9700704Z  * [new branch]              ngimel/hostalloc            -> origin/ngimel/hostalloc
2025-12-04T09:33:41.9702113Z  * [new branch]              ngimel/storage_id           -> origin/ngimel/storage_id
2025-12-04T09:33:41.9703540Z  * [new branch]              nightly                     -> origin/nightly
2025-12-04T09:33:41.9705765Z  * [new branch]              nikitaved/addmm_1_rowcol_lt_path_check -> origin/nikitaved/addmm_1_rowcol_lt_path_check
2025-12-04T09:33:41.9707208Z  * [new branch]              nikitaved/addmm_epilogue_fusions_2d_bias -> origin/nikitaved/addmm_epilogue_fusions_2d_bias
2025-12-04T09:33:41.9708430Z  * [new branch]              nikitaved/addmm_epilogue_fusions_inductor -> origin/nikitaved/addmm_epilogue_fusions_inductor
2025-12-04T09:33:41.9709920Z  * [new branch]              nikitaved/addmm_epilogue_fusions_scratch -> origin/nikitaved/addmm_epilogue_fusions_scratch
2025-12-04T09:33:41.9711454Z  * [new branch]              nikitaved/grad_addmm_epilogue_fusions -> origin/nikitaved/grad_addmm_epilogue_fusions
2025-12-04T09:33:41.9713146Z  * [new branch]              nikitaved/simpler_can_use_32bit_index -> origin/nikitaved/simpler_can_use_32bit_index
2025-12-04T09:33:41.9714487Z  * [new branch]              nikitaved/test              -> origin/nikitaved/test
2025-12-04T09:33:41.9716150Z  * [new branch]              nmacchioni-perf-test-async-autotune -> origin/nmacchioni-perf-test-async-autotune
2025-12-04T09:33:41.9717391Z  * [new branch]              no_distributed_log_spew     -> origin/no_distributed_log_spew
2025-12-04T09:33:41.9718756Z  * [new branch]              nofun-hack                  -> origin/nofun-hack
2025-12-04T09:33:41.9720078Z  * [new branch]              norm_bench                  -> origin/norm_bench
2025-12-04T09:33:41.9721915Z  * [new branch]              nullplay/fuse_matmul        -> origin/nullplay/fuse_matmul
2025-12-04T09:33:41.9723445Z  * [new branch]              nullplay_fuse_matmul        -> origin/nullplay_fuse_matmul
2025-12-04T09:33:41.9724846Z  * [new branch]              optimizer_test              -> origin/optimizer_test
2025-12-04T09:33:41.9727208Z  * [new branch]              orig/release/1.10           -> origin/orig/release/1.10
2025-12-04T09:33:41.9728565Z  * [new branch]              orig/release/1.11           -> origin/orig/release/1.11
2025-12-04T09:33:41.9729903Z  * [new branch]              orig/release/1.12           -> origin/orig/release/1.12
2025-12-04T09:33:41.9731439Z  * [new branch]              orig/release/1.13           -> origin/orig/release/1.13
2025-12-04T09:33:41.9732834Z  * [new branch]              orig/release/1.6            -> origin/orig/release/1.6
2025-12-04T09:33:41.9734338Z  * [new branch]              orig/release/1.7            -> origin/orig/release/1.7
2025-12-04T09:33:41.9735718Z  * [new branch]              orig/release/1.8            -> origin/orig/release/1.8
2025-12-04T09:33:41.9737071Z  * [new branch]              orig/release/1.9            -> origin/orig/release/1.9
2025-12-04T09:33:41.9738343Z  * [new branch]              orig/release/2.0            -> origin/orig/release/2.0
2025-12-04T09:33:41.9739634Z  * [new branch]              orig/release/2.1            -> origin/orig/release/2.1
2025-12-04T09:33:41.9741047Z  * [new branch]              orig/release/2.2            -> origin/orig/release/2.2
2025-12-04T09:33:41.9742237Z  * [new branch]              orig/release/2.3            -> origin/orig/release/2.3
2025-12-04T09:33:41.9743504Z  * [new branch]              orig/release/2.4            -> origin/orig/release/2.4
2025-12-04T09:33:41.9745210Z  * [new branch]              orig/release/2.5            -> origin/orig/release/2.5
2025-12-04T09:33:41.9746982Z  * [new branch]              orig/release/2.6            -> origin/orig/release/2.6
2025-12-04T09:33:41.9748568Z  * [new branch]              orig/release/2.7            -> origin/orig/release/2.7
2025-12-04T09:33:41.9750439Z  * [new branch]              orig/release/2.8            -> origin/orig/release/2.8
2025-12-04T09:33:41.9751732Z  * [new branch]              orig/release/2.9            -> origin/orig/release/2.9
2025-12-04T09:33:41.9754752Z  * [new branch]              origin/gh/fxdawnn/1/base    -> origin/origin/gh/fxdawnn/1/base
2025-12-04T09:33:41.9755898Z  * [new branch]              origin/gh/fxdawnn/1/orig    -> origin/origin/gh/fxdawnn/1/orig
2025-12-04T09:33:41.9758048Z  * [new branch]              origin/gh/zpcore/14/orig    -> origin/origin/gh/zpcore/14/orig
2025-12-04T09:33:41.9759497Z  * [new branch]              oulgen-patch-1              -> origin/oulgen-patch-1
2025-12-04T09:33:41.9760947Z  * [new branch]              oulgen-patch-2              -> origin/oulgen-patch-2
2025-12-04T09:33:41.9762467Z  * [new branch]              oulgen-patch-3              -> origin/oulgen-patch-3
2025-12-04T09:33:41.9763986Z  * [new branch]              oulgen-patch-4              -> origin/oulgen-patch-4
2025-12-04T09:33:41.9765366Z  * [new branch]              padded-tensor               -> origin/padded-tensor
2025-12-04T09:33:41.9766902Z  * [new branch]              pca2                        -> origin/pca2
2025-12-04T09:33:41.9768399Z  * [new branch]              per_channel_backup          -> origin/per_channel_backup
2025-12-04T09:33:41.9769881Z  * [new branch]              perf_ops                    -> origin/perf_ops
2025-12-04T09:33:41.9771382Z  * [new branch]              perf_ops_2_9                -> origin/perf_ops_2_9
2025-12-04T09:33:41.9772869Z  * [new branch]              pianpwk-patch-1             -> origin/pianpwk-patch-1
2025-12-04T09:33:41.9774661Z  * [new branch]              pianpwk/__draft_debug_mode  -> origin/pianpwk/__draft_debug_mode
2025-12-04T09:33:41.9775984Z  * [new branch]              pianpwk/_debug_mode_for_triton_draft -> origin/pianpwk/_debug_mode_for_triton_draft
2025-12-04T09:33:41.9777168Z  * [new branch]              pianpwk/_debug_nn_module_compile -> origin/pianpwk/_debug_nn_module_compile
2025-12-04T09:33:41.9778304Z  * [new branch]              pianpwk/_draft_triton_11_3  -> origin/pianpwk/_draft_triton_11_3
2025-12-04T09:33:41.9779523Z  * [new branch]              pianpwk/_manual_bucket_draft -> origin/pianpwk/_manual_bucket_draft
2025-12-04T09:33:41.9781025Z  * [new branch]              pianpwk/_profile_w_dispatch_keys -> origin/pianpwk/_profile_w_dispatch_keys
2025-12-04T09:33:41.9782588Z  * [new branch]              pianpwk/_super_draft_debug_mode -> origin/pianpwk/_super_draft_debug_mode
2025-12-04T09:33:41.9784045Z  * [new branch]              pianpwk/_unbacked_local_shard_size -> origin/pianpwk/_unbacked_local_shard_size
2025-12-04T09:33:41.9785266Z  * [new branch]              pianpwk/anomaly_tb          -> origin/pianpwk/anomaly_tb
2025-12-04T09:33:41.9786548Z  * [new branch]              pianpwk/auto_fx_annotate    -> origin/pianpwk/auto_fx_annotate
2025-12-04T09:33:41.9788085Z  * [new branch]              pianpwk/backed_size_oblivious_export -> origin/pianpwk/backed_size_oblivious_export
2025-12-04T09:33:41.9789362Z  * [new branch]              pianpwk/bert_dynamic_perf   -> origin/pianpwk/bert_dynamic_perf
2025-12-04T09:33:41.9790614Z  * [new branch]              pianpwk/debug_fwd_stack_traces -> origin/pianpwk/debug_fwd_stack_traces
2025-12-04T09:33:41.9791931Z  * [new branch]              pianpwk/debug_hash_tensor   -> origin/pianpwk/debug_hash_tensor
2025-12-04T09:33:41.9793313Z  * [new branch]              pianpwk/debug_mode_annotate -> origin/pianpwk/debug_mode_annotate
2025-12-04T09:33:41.9794477Z  * [new branch]              pianpwk/debug_mode_defaults -> origin/pianpwk/debug_mode_defaults
2025-12-04T09:33:41.9795720Z  * [new branch]              pianpwk/debug_mode_hacks    -> origin/pianpwk/debug_mode_hacks
2025-12-04T09:33:41.9797080Z  * [new branch]              pianpwk/debug_mode_opcall_refactor -> origin/pianpwk/debug_mode_opcall_refactor
2025-12-04T09:33:41.9798326Z  * [new branch]              pianpwk/debug_mode_show_ids -> origin/pianpwk/debug_mode_show_ids
2025-12-04T09:33:41.9799551Z  * [new branch]              pianpwk/debug_mode_triton   -> origin/pianpwk/debug_mode_triton
2025-12-04T09:33:41.9801115Z  * [new branch]              pianpwk/debug_show_stack_trace -> origin/pianpwk/debug_show_stack_trace
2025-12-04T09:33:41.9802597Z  * [new branch]              pianpwk/debug_wait_on_collective -> origin/pianpwk/debug_wait_on_collective
2025-12-04T09:33:41.9803958Z  * [new branch]              pianpwk/debugmode_compile_tf -> origin/pianpwk/debugmode_compile_tf
2025-12-04T09:33:41.9805411Z  * [new branch]              pianpwk/dispatch_key_debugging_for_debug -> origin/pianpwk/dispatch_key_debugging_for_debug
2025-12-04T09:33:41.9806605Z  * [new branch]              pianpwk/draft_debug_mode_tfcompile -> origin/pianpwk/draft_debug_mode_tfcompile
2025-12-04T09:33:41.9807846Z  * [new branch]              pianpwk/draft_multikernel_nn -> origin/pianpwk/draft_multikernel_nn
2025-12-04T09:33:41.9809404Z  * [new branch]              pianpwk/draft_multikernel_status_10_5 -> origin/pianpwk/draft_multikernel_status_10_5
2025-12-04T09:33:41.9810725Z  * [new branch]              pianpwk/dtensor_custom_chunk -> origin/pianpwk/dtensor_custom_chunk
2025-12-04T09:33:41.9812168Z  * [new branch]              pianpwk/dtensor_unbacked_keypath -> origin/pianpwk/dtensor_unbacked_keypath
2025-12-04T09:33:41.9813540Z  * [new branch]              pianpwk/event_list_tree     -> origin/pianpwk/event_list_tree
2025-12-04T09:33:41.9814795Z  * [new branch]              pianpwk/false_numel_refs    -> origin/pianpwk/false_numel_refs
2025-12-04T09:33:41.9816075Z  * [new branch]              pianpwk/maybe_guard_rel     -> origin/pianpwk/maybe_guard_rel
2025-12-04T09:33:41.9817446Z  * [new branch]              pianpwk/multikernel_hints_draft -> origin/pianpwk/multikernel_hints_draft
2025-12-04T09:33:41.9818790Z  * [new branch]              pianpwk/no_size_oblivious_slice_scat -> origin/pianpwk/no_size_oblivious_slice_scat
2025-12-04T09:33:41.9820078Z  * [new branch]              pianpwk/oblivious_reshape_view_better -> origin/pianpwk/oblivious_reshape_view_better
2025-12-04T09:33:41.9821257Z  * [new branch]              pianpwk/pre_forward_hook    -> origin/pianpwk/pre_forward_hook
2025-12-04T09:33:41.9822566Z  * [new branch]              pianpwk/skip_python_keys_alternate -> origin/pianpwk/skip_python_keys_alternate
2025-12-04T09:33:41.9823860Z  * [new branch]              pianpwk/skip_python_keys_in_guards -> origin/pianpwk/skip_python_keys_in_guards
2025-12-04T09:33:41.9825020Z  * [new branch]              pianpwk/sym_tokens_draft    -> origin/pianpwk/sym_tokens_draft
2025-12-04T09:33:41.9826285Z  * [new branch]              pianpwk/symint_one_hot      -> origin/pianpwk/symint_one_hot
2025-12-04T09:33:41.9827740Z  * [new branch]              pianpwk/test_pointwise_guard_or_false -> origin/pianpwk/test_pointwise_guard_or_false
2025-12-04T09:33:41.9828941Z  * [new branch]              pianpwk/totally_draft_sym_wrap -> origin/pianpwk/totally_draft_sym_wrap
2025-12-04T09:33:41.9830213Z  * [new branch]              pianpwk/try_dumb_stuff      -> origin/pianpwk/try_dumb_stuff
2025-12-04T09:33:41.9831552Z  * [new branch]              pianpwk/try_dumb_stuff_2    -> origin/pianpwk/try_dumb_stuff_2
2025-12-04T09:33:41.9832891Z  * [new branch]              pianpwk/unbacked_dtensor_mm -> origin/pianpwk/unbacked_dtensor_mm
2025-12-04T09:33:41.9834172Z  * [new branch]              pianpwk/unbacked_tracing_12_2 -> origin/pianpwk/unbacked_tracing_12_2
2025-12-04T09:33:41.9835332Z  * [new branch]              pianpwk/user_symints        -> origin/pianpwk/user_symints
2025-12-04T09:33:41.9836539Z  * [new branch]              pianpwk/wan21_reshape       -> origin/pianpwk/wan21_reshape
2025-12-04T09:33:41.9838368Z  * [new branch]              piz/fix_partial_backward_1112 -> origin/piz/fix_partial_backward_1112
2025-12-04T09:33:41.9839530Z  * [new branch]              piz/prop_cache_clean        -> origin/piz/prop_cache_clean
2025-12-04T09:33:41.9842698Z  * [new branch]              pool-separate               -> origin/pool-separate
2025-12-04T09:33:41.9843094Z  * [new branch]              pr-156087                   -> origin/pr-156087
2025-12-04T09:33:41.9844877Z  * [new branch]              pr/131860                   -> origin/pr/131860
2025-12-04T09:33:41.9846086Z  * [new branch]              predispatch_to              -> origin/predispatch_to
2025-12-04T09:33:41.9847420Z  * [new branch]              protect-c17                 -> origin/protect-c17
2025-12-04T09:33:41.9848836Z  * [new branch]              pt-opt-cuda3                -> origin/pt-opt-cuda3
2025-12-04T09:33:41.9850952Z  * [new branch]              python_compiled_autograd    -> origin/python_compiled_autograd
2025-12-04T09:33:41.9853092Z  * [new branch]              q1l1/fix_device_moved_constant_type_unknown -> origin/q1l1/fix_device_moved_constant_type_unknown
2025-12-04T09:33:41.9854171Z  * [new branch]              q1l1/fix_wrong_default_type_for_kernel_call_args -> origin/q1l1/fix_wrong_default_type_for_kernel_call_args
2025-12-04T09:33:41.9856761Z  * [new branch]              qchip/export-D54134695      -> origin/qchip/export-D54134695
2025-12-04T09:33:41.9858303Z  * [new branch]              quote-pytest_cache          -> origin/quote-pytest_cache
2025-12-04T09:33:41.9859996Z  * [new branch]              reland-accgrad-stream-warn  -> origin/reland-accgrad-stream-warn
2025-12-04T09:33:41.9861855Z  * [new branch]              release/1.10                -> origin/release/1.10
2025-12-04T09:33:41.9863150Z  * [new branch]              release/1.11                -> origin/release/1.11
2025-12-04T09:33:41.9864492Z  * [new branch]              release/1.12                -> origin/release/1.12
2025-12-04T09:33:41.9865828Z  * [new branch]              release/1.13                -> origin/release/1.13
2025-12-04T09:33:41.9867029Z  * [new branch]              release/1.4                 -> origin/release/1.4
2025-12-04T09:33:41.9868100Z  * [new branch]              release/1.4.1               -> origin/release/1.4.1
2025-12-04T09:33:41.9869383Z  * [new branch]              release/1.5                 -> origin/release/1.5
2025-12-04T09:33:41.9870853Z  * [new branch]              release/1.6                 -> origin/release/1.6
2025-12-04T09:33:41.9872220Z  * [new branch]              release/1.7                 -> origin/release/1.7
2025-12-04T09:33:41.9873643Z  * [new branch]              release/1.8                 -> origin/release/1.8
2025-12-04T09:33:41.9874916Z  * [new branch]              release/1.9                 -> origin/release/1.9
2025-12-04T09:33:41.9876266Z  * [new branch]              release/2.0                 -> origin/release/2.0
2025-12-04T09:33:41.9877659Z  * [new branch]              release/2.1                 -> origin/release/2.1
2025-12-04T09:33:41.9879007Z  * [new branch]              release/2.2                 -> origin/release/2.2
2025-12-04T09:33:41.9880681Z  * [new branch]              release/2.3                 -> origin/release/2.3
2025-12-04T09:33:41.9882973Z  * [new branch]              release/2.4                 -> origin/release/2.4
2025-12-04T09:33:41.9884826Z  * [new branch]              release/2.5                 -> origin/release/2.5
2025-12-04T09:33:41.9886204Z  * [new branch]              release/2.6                 -> origin/release/2.6
2025-12-04T09:33:41.9887591Z  * [new branch]              release/2.7                 -> origin/release/2.7
2025-12-04T09:33:41.9889176Z  * [new branch]              release/2.8                 -> origin/release/2.8
2025-12-04T09:33:41.9890568Z  * [new branch]              release/2.9                 -> origin/release/2.9
2025-12-04T09:33:41.9891983Z  * [new branch]              release_notes               -> origin/release_notes
2025-12-04T09:33:41.9893405Z  * [new branch]              remove_pyinterpreter        -> origin/remove_pyinterpreter
2025-12-04T09:33:41.9895165Z  * [new branch]              replace-pytorch-labs-20250812-195836 -> origin/replace-pytorch-labs-20250812-195836
2025-12-04T09:33:41.9896374Z  * [new branch]              replace-pytorch-labs-20250812-200248 -> origin/replace-pytorch-labs-20250812-200248
2025-12-04T09:33:41.9897381Z  * [new branch]              replace-pytorch-labs-20250812-200324 -> origin/replace-pytorch-labs-20250812-200324
2025-12-04T09:33:41.9898803Z  * [new branch]              replace-pytorch-labs-20250812-204020 -> origin/replace-pytorch-labs-20250812-204020
2025-12-04T09:33:41.9901579Z  * [new branch]              revert-131069-gh/krzysztofjordan/1/head -> origin/revert-131069-gh/krzysztofjordan/1/head
2025-12-04T09:33:41.9904647Z  * [new branch]              revert-131469-gh/andrewor14/51/head -> origin/revert-131469-gh/andrewor14/51/head
2025-12-04T09:33:41.9907150Z  * [new branch]              revert-152361-gh/fadara01/1/head -> origin/revert-152361-gh/fadara01/1/head
2025-12-04T09:33:41.9909742Z  * [new branch]              revert-156870-gh/skarjala/3/head -> origin/revert-156870-gh/skarjala/3/head
2025-12-04T09:33:41.9911484Z  * [new branch]              revert-157914-cherry-pick-157503-by-pytorch_bot_bot_ -> origin/revert-157914-cherry-pick-157503-by-pytorch_bot_bot_
2025-12-04T09:33:41.9912655Z  * [new branch]              revert-hoo-invoke-subgraph  -> origin/revert-hoo-invoke-subgraph
2025-12-04T09:33:41.9914029Z  * [new branch]              revert_always_build_distributed -> origin/revert_always_build_distributed
2025-12-04T09:33:41.9915282Z  * [new branch]              rms_norm_patch              -> origin/rms_norm_patch
2025-12-04T09:33:41.9917251Z  * [new branch]              ruisi/fix_all_to_all_estimation -> origin/ruisi/fix_all_to_all_estimation
2025-12-04T09:33:41.9918323Z  * [new branch]              ruisi/fix_comm_estimation   -> origin/ruisi/fix_comm_estimation
2025-12-04T09:33:41.9919689Z  * [new branch]              ruisi/fix_dynamic_shape_estimation -> origin/ruisi/fix_dynamic_shape_estimation
2025-12-04T09:33:41.9920901Z  * [new branch]              ruisi/fix_llama3_autobucketing -> origin/ruisi/fix_llama3_autobucketing
2025-12-04T09:33:41.9922455Z  * [new branch]              ruisi/fix_manual_bucketing_ep_pass -> origin/ruisi/fix_manual_bucketing_ep_pass
2025-12-04T09:33:41.9924208Z  * [new branch]              ruisi/manual_bucket_pass    -> origin/ruisi/manual_bucket_pass
2025-12-04T09:33:41.9926340Z  * [new branch]              ryanguo99/cleanup-dynamo-expected-failures -> origin/ryanguo99/cleanup-dynamo-expected-failures
2025-12-04T09:33:41.9927297Z  * [new branch]              ryanguo99/fix-closure-var   -> origin/ryanguo99/fix-closure-var
2025-12-04T09:33:41.9929166Z  * [new branch]              rzou/faketensor_bench       -> origin/rzou/faketensor_bench
2025-12-04T09:33:41.9930377Z  * [new branch]              rzou/njt                    -> origin/rzou/njt
2025-12-04T09:33:41.9932074Z  * [new branch]              rzou/pca                    -> origin/rzou/pca
2025-12-04T09:33:41.9933774Z  * [new branch]              rzou/realprop               -> origin/rzou/realprop
2025-12-04T09:33:41.9935284Z  * [new branch]              samplevllm                  -> origin/samplevllm
2025-12-04T09:33:41.9937565Z  * [new branch]              sanchitintel/weird_thing_with_test_cpu_select_algorithm -> origin/sanchitintel/weird_thing_with_test_cpu_select_algorithm
2025-12-04T09:33:41.9938762Z  * [new branch]              sapling-pr-archive-SS-JIA   -> origin/sapling-pr-archive-SS-JIA
2025-12-04T09:33:41.9940283Z  * [new branch]              sapling-pr-archive-tushar00jain -> origin/sapling-pr-archive-tushar00jain
2025-12-04T09:33:41.9941490Z  * [new branch]              save                        -> origin/save
2025-12-04T09:33:41.9942879Z  * [new branch]              scaled_mm                   -> origin/scaled_mm
2025-12-04T09:33:41.9944213Z  * [new branch]              scan_attempt                -> origin/scan_attempt
2025-12-04T09:33:41.9946525Z  * [new branch]              sdym/2.5.1                  -> origin/sdym/2.5.1
2025-12-04T09:33:41.9947993Z  * [new branch]              sekyondaMeta-dynamoconfig-fix -> origin/sekyondaMeta-dynamoconfig-fix
2025-12-04T09:33:41.9949625Z  * [new branch]              shengf/fx-xform-perf        -> origin/shengf/fx-xform-perf
2025-12-04T09:33:41.9951082Z  * [new branch]              shoumikhin-patch-1          -> origin/shoumikhin-patch-1
2025-12-04T09:33:41.9952417Z  * [new branch]              solve-accuracy-fix          -> origin/solve-accuracy-fix
2025-12-04T09:33:41.9953764Z  * [new branch]              some_rocm_inductor_skips    -> origin/some_rocm_inductor_skips
2025-12-04T09:33:41.9955521Z  * [new branch]              soulitzer/stash-tls-ac      -> origin/soulitzer/stash-tls-ac
2025-12-04T09:33:41.9956949Z  * [new branch]              sparse-mm-bf16-support      -> origin/sparse-mm-bf16-support
2025-12-04T09:33:41.9958301Z  * [new branch]              starterTaskUpdate           -> origin/starterTaskUpdate
2025-12-04T09:33:41.9959647Z  * [new branch]              suo                         -> origin/suo
2025-12-04T09:33:41.9961033Z  * [new branch]              sve-poc                     -> origin/sve-poc
2025-12-04T09:33:41.9962507Z  * [new branch]              switch-bn                   -> origin/switch-bn
2025-12-04T09:33:41.9964007Z  * [new branch]              sy_annotation_in_autograd_hop -> origin/sy_annotation_in_autograd_hop
2025-12-04T09:33:41.9965327Z  * [new branch]              sy_aot_eager_record         -> origin/sy_aot_eager_record
2025-12-04T09:33:41.9966821Z  * [new branch]              sy_custom_bucketing         -> origin/sy_custom_bucketing
2025-12-04T09:33:41.9968291Z  * [new branch]              sy_debug_mode_test          -> origin/sy_debug_mode_test
2025-12-04T09:33:41.9969650Z  * [new branch]              sy_deserialize              -> origin/sy_deserialize
2025-12-04T09:33:41.9970990Z  * [new branch]              sy_dump_gm_code             -> origin/sy_dump_gm_code
2025-12-04T09:33:41.9972311Z  * [new branch]              sy_exp                      -> origin/sy_exp
2025-12-04T09:33:41.9973740Z  * [new branch]              sy_export_annotation        -> origin/sy_export_annotation
2025-12-04T09:33:41.9975094Z  * [new branch]              sy_invoke_subgraph          -> origin/sy_invoke_subgraph
2025-12-04T09:33:41.9976485Z  * [new branch]              sy_kernel_bw_name           -> origin/sy_kernel_bw_name
2025-12-04T09:33:41.9977809Z  * [new branch]              sy_multi_arch               -> origin/sy_multi_arch
2025-12-04T09:33:41.9979179Z  * [new branch]              sy_nn_module_stack          -> origin/sy_nn_module_stack
2025-12-04T09:33:41.9980549Z  * [new branch]              sy_original_dtensor         -> origin/sy_original_dtensor
2025-12-04T09:33:41.9981853Z  * [new branch]              sy_profiler_cia             -> origin/sy_profiler_cia
2025-12-04T09:33:41.9983211Z  * [new branch]              symm_mem_sync               -> origin/symm_mem_sync
2025-12-04T09:33:41.9984635Z  * [new branch]              sympy-bottleneck-repro      -> origin/sympy-bottleneck-repro
2025-12-04T09:33:41.9986018Z  * [new branch]              tensordict_integration      -> origin/tensordict_integration
2025-12-04T09:33:41.9987498Z  * [new branch]              test-move-conda-builds      -> origin/test-move-conda-builds
2025-12-04T09:33:41.9988895Z  * [new branch]              test-old                    -> origin/test-old
2025-12-04T09:33:41.9990656Z  * [new branch]              test/bmm_heur               -> origin/test/bmm_heur
2025-12-04T09:33:41.9992500Z  * [new branch]              tianren/customOp_autotune_fix -> origin/tianren/customOp_autotune_fix
2025-12-04T09:33:41.9993847Z  * [new branch]              tianren/customOp_enable_max_autotune -> origin/tianren/customOp_enable_max_autotune
2025-12-04T09:33:41.9994986Z  * [new branch]              tianren/customOp_fusion     -> origin/tianren/customOp_fusion
2025-12-04T09:33:41.9996388Z  * [new branch]              tianren/customop_collectiveop_benchmark -> origin/tianren/customop_collectiveop_benchmark
2025-12-04T09:33:41.9997954Z  * [new branch]              tianren/customop_collectiveop_benchmark_fix -> origin/tianren/customop_collectiveop_benchmark_fix
2025-12-04T09:33:41.9999577Z  * [new branch]              tianren/customop_dynamic_config -> origin/tianren/customop_dynamic_config
2025-12-04T09:33:42.0000938Z  * [new branch]              tianren/dynamic_range_input -> origin/tianren/dynamic_range_input
2025-12-04T09:33:42.0005289Z  * [new branch]              tianren/dynamic_range_input_fix -> origin/tianren/dynamic_range_input_fix
2025-12-04T09:33:42.0006533Z  * [new branch]              tianren/dynamic_range_input_merge -> origin/tianren/dynamic_range_input_merge
2025-12-04T09:33:42.0007769Z  * [new branch]              tianren/flex_paged_attn_fix_temp -> origin/tianren/flex_paged_attn_fix_temp
2025-12-04T09:33:42.0009103Z  * [new branch]              tianren/fx_codegen_dump     -> origin/tianren/fx_codegen_dump
2025-12-04T09:33:42.0010399Z  * [new branch]              tianren/symmetric_memory    -> origin/tianren/symmetric_memory
2025-12-04T09:33:42.0011663Z  * [new branch]              tianren/test                -> origin/tianren/test
2025-12-04T09:33:42.0013080Z  * [new branch]              tidy_performance_cyy        -> origin/tidy_performance_cyy
2025-12-04T09:33:42.0014375Z  * [new branch]              tmp                         -> origin/tmp
2025-12-04T09:33:42.0015778Z  * [new branch]              torchtitan_ep               -> origin/torchtitan_ep
2025-12-04T09:33:42.0017209Z  * [new branch]              torchtitan_integration      -> origin/torchtitan_integration
2025-12-04T09:33:42.0018738Z  * [new branch]              trace_fsdp_torchtune_lora   -> origin/trace_fsdp_torchtune_lora
2025-12-04T09:33:42.0019941Z  * [new branch]              traceable_fsdp_unit_tests   -> origin/traceable_fsdp_unit_tests
2025-12-04T09:33:42.0021359Z  * [new branch]              tree_loop_vec_base          -> origin/tree_loop_vec_base
2025-12-04T09:33:42.0022714Z  * [new branch]              triton_kernel               -> origin/triton_kernel
2025-12-04T09:33:42.0024073Z  * [new branch]              tt_pkg_1908                 -> origin/tt_pkg_1908
2025-12-04T09:33:42.0025915Z  * [new branch]              type_dec                    -> origin/type_dec
2025-12-04T09:33:42.0027394Z  * [new branch]              udate-sphinx-dependancies   -> origin/udate-sphinx-dependancies
2025-12-04T09:33:42.0029398Z  * [new branch]              update-audio-commit-hash/17630256502-1803-1 -> origin/update-audio-commit-hash/17630256502-1803-1
2025-12-04T09:33:42.0030643Z  * [new branch]              update-audio-commit-hash/19087141161-1916-1 -> origin/update-audio-commit-hash/19087141161-1916-1
2025-12-04T09:33:42.0031926Z  * [new branch]              update-audio-commit-hash/19250643381-1929-1 -> origin/update-audio-commit-hash/19250643381-1929-1
2025-12-04T09:33:42.0033262Z  * [new branch]              update-audio-commit-hash/19397724337-1935-1 -> origin/update-audio-commit-hash/19397724337-1935-1
2025-12-04T09:33:42.0034376Z  * [new branch]              update-audio-commit-hash/19555670148-1941-1 -> origin/update-audio-commit-hash/19555670148-1941-1
2025-12-04T09:33:42.0035954Z  * [new branch]              update-audio-commit-hash/19750627930-1946-1 -> origin/update-audio-commit-hash/19750627930-1946-1
2025-12-04T09:33:42.0037801Z  * [new branch]              update-triton-commit-hash/13663274526-1487-2 -> origin/update-triton-commit-hash/13663274526-1487-2
2025-12-04T09:33:42.0039531Z  * [new branch]              update-vision-commit-hash/19087141161-1916-1 -> origin/update-vision-commit-hash/19087141161-1916-1
2025-12-04T09:33:42.0040768Z  * [new branch]              update-vision-commit-hash/19184897099-1925-1 -> origin/update-vision-commit-hash/19184897099-1925-1
2025-12-04T09:33:42.0041863Z  * [new branch]              update-vision-commit-hash/19250643381-1929-1 -> origin/update-vision-commit-hash/19250643381-1929-1
2025-12-04T09:33:42.0043453Z  * [new branch]              update-vision-commit-hash/19381328640-1934-1 -> origin/update-vision-commit-hash/19381328640-1934-1
2025-12-04T09:33:42.0044560Z  * [new branch]              update-vision-commit-hash/19485237164-1938-1 -> origin/update-vision-commit-hash/19485237164-1938-1
2025-12-04T09:33:42.0046498Z  * [new branch]              update-vllm-commit-hash/18451675449-1879-1 -> origin/update-vllm-commit-hash/18451675449-1879-1
2025-12-04T09:33:42.0047823Z  * [new branch]              update-vllm-dockerfile      -> origin/update-vllm-dockerfile
2025-12-04T09:33:42.0049726Z  * [new branch]              update-xla-commit-hash/19224287370-211-1 -> origin/update-xla-commit-hash/19224287370-211-1
2025-12-04T09:33:42.0051054Z  * [new branch]              update-xla-commit-hash/19422028566-212-1 -> origin/update-xla-commit-hash/19422028566-212-1
2025-12-04T09:33:42.0052221Z  * [new branch]              update-xla-commit-hash/19626841311-213-1 -> origin/update-xla-commit-hash/19626841311-213-1
2025-12-04T09:33:42.0053676Z  * [new branch]              update_docs_torch_multinomial_issue#125388 -> origin/update_docs_torch_multinomial_issue#125388
2025-12-04T09:33:42.0054958Z  * [new branch]              update_operator_readme      -> origin/update_operator_readme
2025-12-04T09:33:42.0056341Z  * [new branch]              update_slow_tests_1722488736 -> origin/update_slow_tests_1722488736
2025-12-04T09:33:42.0058183Z  * [new branch]              update_slow_tests_1722879173 -> origin/update_slow_tests_1722879173
2025-12-04T09:33:42.0059536Z  * [new branch]              update_slow_tests_1762155677 -> origin/update_slow_tests_1762155677
2025-12-04T09:33:42.0060977Z  * [new branch]              update_slow_tests_1763365283 -> origin/update_slow_tests_1763365283
2025-12-04T09:33:42.0062246Z  * [new branch]              update_submodule_FBGEMM     -> origin/update_submodule_FBGEMM
2025-12-04T09:33:42.0063696Z  * [new branch]              update_submodule_kineto     -> origin/update_submodule_kineto
2025-12-04T09:33:42.0065157Z  * [new branch]              update_submodule_tensorpipe -> origin/update_submodule_tensorpipe
2025-12-04T09:33:42.0066503Z  * [new branch]              upload-tests-for-autorevert -> origin/upload-tests-for-autorevert
2025-12-04T09:33:42.0067902Z  * [new branch]              v0.1.2                      -> origin/v0.1.2
2025-12-04T09:33:42.0069406Z  * [new branch]              v1.0.1                      -> origin/v1.0.1
2025-12-04T09:33:42.0070870Z  * [new branch]              v1.0.3                      -> origin/v1.0.3
2025-12-04T09:33:42.0072465Z  * [new branch]              v1.1.0                      -> origin/v1.1.0
2025-12-04T09:33:42.0074053Z  * [new branch]              v1.2.0                      -> origin/v1.2.0
2025-12-04T09:33:42.0075461Z  * [new branch]              v1.3.0                      -> origin/v1.3.0
2025-12-04T09:33:42.0076924Z  * [new branch]              v1.3.1                      -> origin/v1.3.1
2025-12-04T09:33:42.0078365Z  * [new branch]              validate_fn                 -> origin/validate_fn
2025-12-04T09:33:42.0079880Z  * [new branch]              validations_2.6             -> origin/validations_2.6
2025-12-04T09:33:42.0081368Z  * [new branch]              validations_2.8             -> origin/validations_2.8
2025-12-04T09:33:42.0082787Z  * [new branch]              varlen-api                  -> origin/varlen-api
2025-12-04T09:33:42.0084214Z  * [new branch]              varlen-api-backup           -> origin/varlen-api-backup
2025-12-04T09:33:42.0085527Z  * [new branch]              varlen_batch_invariance     -> origin/varlen_batch_invariance
2025-12-04T09:33:42.0087130Z  * [new branch]              viable/strict               -> origin/viable/strict
2025-12-04T09:33:42.0089139Z  * [new branch]              vishal9-team/dtensor_parallelism_toy -> origin/vishal9-team/dtensor_parallelism_toy
2025-12-04T09:33:42.0090306Z  * [new branch]              vllmbuildci                 -> origin/vllmbuildci
2025-12-04T09:33:42.0091738Z  * [new branch]              vllmpin                     -> origin/vllmpin
2025-12-04T09:33:42.0093288Z  * [new branch]              vscode-recommend-pyrefly    -> origin/vscode-recommend-pyrefly
2025-12-04T09:33:42.0094798Z  * [new branch]              wdvr-patch-1                -> origin/wdvr-patch-1
2025-12-04T09:33:42.0096547Z  * [new branch]              wdvr/iss_145259             -> origin/wdvr/iss_145259
2025-12-04T09:33:42.0098251Z  * [new branch]              whc/pei                     -> origin/whc/pei
2025-12-04T09:33:42.0099521Z  * [new branch]              whc/pp_fix                  -> origin/whc/pp_fix
2025-12-04T09:33:42.0101040Z  * [new branch]              whc/sharding                -> origin/whc/sharding
2025-12-04T09:33:42.0102386Z  * [new branch]              whc/sharding2               -> origin/whc/sharding2
2025-12-04T09:33:42.0103485Z  * [new branch]              whc/uneven                  -> origin/whc/uneven
2025-12-04T09:33:42.0105196Z  * [new branch]              whc/uneven-merge            -> origin/whc/uneven-merge
2025-12-04T09:33:42.0106601Z  * [new branch]              win_warnings                -> origin/win_warnings
2025-12-04T09:33:42.0107930Z  * [new branch]              windows_libtorch_free       -> origin/windows_libtorch_free
2025-12-04T09:33:42.0109261Z  * [new branch]              xmfan-war                   -> origin/xmfan-war
2025-12-04T09:33:42.0110995Z  * [new branch]              xmfan/ca_0516               -> origin/xmfan/ca_0516
2025-12-04T09:33:42.0112265Z  * [new branch]              xmfan/ca_1051b93192         -> origin/xmfan/ca_1051b93192
2025-12-04T09:33:42.0113837Z  * [new branch]              xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8 -> origin/xmfan/ca_1a722f62c248391fc4a542e8851a5559aa356ae8
2025-12-04T09:33:42.0114532Z  * [new branch]              xmfan/ca_5a2be192d1         -> origin/xmfan/ca_5a2be192d1
2025-12-04T09:33:42.0115863Z  * [new branch]              xmfan/ca_9d59b516e9         -> origin/xmfan/ca_9d59b516e9
2025-12-04T09:33:42.0117018Z  * [new branch]              xmfan/ca_apr8               -> origin/xmfan/ca_apr8
2025-12-04T09:33:42.0118218Z  * [new branch]              xmfan/ca_base               -> origin/xmfan/ca_base
2025-12-04T09:33:42.0119686Z  * [new branch]              xmfan/ca_dynamic            -> origin/xmfan/ca_dynamic
2025-12-04T09:33:42.0121304Z  * [new branch]              xmfan/ca_fix_dyn            -> origin/xmfan/ca_fix_dyn
2025-12-04T09:33:42.0122719Z  * [new branch]              xmfan/ca_fix_lowering       -> origin/xmfan/ca_fix_lowering
2025-12-04T09:33:42.0124009Z  * [new branch]              xmfan/ca_fix_polyfills      -> origin/xmfan/ca_fix_polyfills
2025-12-04T09:33:42.0125141Z  * [new branch]              xmfan/ca_jan3               -> origin/xmfan/ca_jan3
2025-12-04T09:33:42.0126382Z  * [new branch]              xmfan/ca_jun18              -> origin/xmfan/ca_jun18
2025-12-04T09:33:42.0127701Z  * [new branch]              xmfan/ca_jun24              -> origin/xmfan/ca_jun24
2025-12-04T09:33:42.0128930Z  * [new branch]              xmfan/ca_nested             -> origin/xmfan/ca_nested
2025-12-04T09:33:42.0130183Z  * [new branch]              xmfan/ca_overhead           -> origin/xmfan/ca_overhead
2025-12-04T09:33:42.0131547Z  * [new branch]              xmfan/ca_overhead_0eba7e5451 -> origin/xmfan/ca_overhead_0eba7e5451
2025-12-04T09:33:42.0132701Z  * [new branch]              xmfan/cacu_jun18            -> origin/xmfan/cacu_jun18
2025-12-04T09:33:42.0134054Z  * [new branch]              xmfan/cacu_jun19            -> origin/xmfan/cacu_jun19
2025-12-04T09:33:42.0135261Z  * [new branch]              xmfan/cacu_jun4             -> origin/xmfan/cacu_jun4
2025-12-04T09:33:42.0136580Z  * [new branch]              xmfan/disable_duck_shape    -> origin/xmfan/disable_duck_shape
2025-12-04T09:33:42.0138318Z  * [new branch]              xmfan/fca_cpp_node_passthrough -> origin/xmfan/fca_cpp_node_passthrough
2025-12-04T09:33:42.0139798Z  * [new branch]              xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/post_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:33:42.0141111Z  * [new branch]              xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9 -> origin/xmfan/pre_3945954741e2d37023c5d6954f9483008e0892f9
2025-12-04T09:33:42.0141955Z  * [new branch]              xmfan/single_step           -> origin/xmfan/single_step
2025-12-04T09:33:42.0143380Z  * [new branch]              xmfan/sth_0829              -> origin/xmfan/sth_0829
2025-12-04T09:33:42.0144676Z  * [new branch]              xmfan/test                  -> origin/xmfan/test
2025-12-04T09:33:42.0146566Z  * [new branch]              yguo/debug-0226-constexpr   -> origin/yguo/debug-0226-constexpr
2025-12-04T09:33:42.0147735Z  * [new branch]              yguo/new_latest_changes     -> origin/yguo/new_latest_changes
2025-12-04T09:33:42.0148964Z  * [new branch]              yguo/patch_constexpr_changes -> origin/yguo/patch_constexpr_changes
2025-12-04T09:33:42.0150619Z  * [new branch]              yiming/bootcamp             -> origin/yiming/bootcamp
2025-12-04T09:33:42.0152094Z  * [new branch]              yiming/run_with_start_end_rng_hop -> origin/yiming/run_with_start_end_rng_hop
2025-12-04T09:33:42.0153758Z  * [new branch]              yolo-llama3                 -> origin/yolo-llama3
2025-12-04T09:33:42.0155517Z  * [new branch]              zainr/canary-test           -> origin/zainr/canary-test
2025-12-04T09:33:42.0156961Z  * [new branch]              zainr/cleanup-gh-runners    -> origin/zainr/cleanup-gh-runners
2025-12-04T09:33:42.0158178Z  * [new branch]              zainr/pull-migration-c      -> origin/zainr/pull-migration-c
2025-12-04T09:33:42.0159389Z  * [new branch]              zainr/test2                 -> origin/zainr/test2
2025-12-04T09:33:42.0161002Z  * [new branch]              zasdfgbnm-patch-3           -> origin/zasdfgbnm-patch-3
2025-12-04T09:33:42.0162272Z  * [new branch]              zb2p                        -> origin/zb2p
2025-12-04T09:33:42.0163815Z  * [new branch]              zeros-and-scatter-part2     -> origin/zeros-and-scatter-part2
2025-12-04T09:33:42.0166369Z  * [new branch]              zhxchen17/ci/vllm_lora_oom  -> origin/zhxchen17/ci/vllm_lora_oom
2025-12-04T09:33:42.0167678Z  * [new branch]              zhxchen17/ci/vllm_multimodal_oom -> origin/zhxchen17/ci/vllm_multimodal_oom
2025-12-04T09:33:42.0168813Z  * [new branch]              zhxchen17/ci/vllm_pin       -> origin/zhxchen17/ci/vllm_pin
2025-12-04T09:33:42.0170699Z  * [new branch]              zhxchen17/dynamo/unsafe_drop_all_guards -> origin/zhxchen17/dynamo/unsafe_drop_all_guards
2025-12-04T09:33:42.0172418Z  * [new branch]              zhxchen17/export/call_override -> origin/zhxchen17/export/call_override
2025-12-04T09:33:42.0173583Z  * [new branch]              zhxchen17/export/codemod1   -> origin/zhxchen17/export/codemod1
2025-12-04T09:33:42.0174938Z  * [new branch]              zhxchen17/export/ctx_return -> origin/zhxchen17/export/ctx_return
2025-12-04T09:33:42.0176326Z  * [new branch]              zhxchen17/export/disable_side_effect_warn -> origin/zhxchen17/export/disable_side_effect_warn
2025-12-04T09:33:42.0177639Z  * [new branch]              zhxchen17/export/pytree_check -> origin/zhxchen17/export/pytree_check
2025-12-04T09:33:42.0179312Z  * [new branch]              zhxchen17/precompile/aoti   -> origin/zhxchen17/precompile/aoti
2025-12-04T09:33:42.0180594Z  * [new branch]              zhxchen17/precompile/globals -> origin/zhxchen17/precompile/globals
2025-12-04T09:33:42.0181992Z  * [new branch]              zhxchen17/precompile/inductor_guards -> origin/zhxchen17/precompile/inductor_guards
2025-12-04T09:33:42.0183432Z  * [new branch]              zhxchen17/scratch/0         -> origin/zhxchen17/scratch/0
2025-12-04T09:33:42.0184880Z  * [new branch]              zhxchen17/torch_export_api_update -> origin/zhxchen17/torch_export_api_update
2025-12-04T09:33:42.0186639Z  * [new branch]              zhxhcen17/moodycamel        -> origin/zhxhcen17/moodycamel
2025-12-04T09:33:42.0188539Z  * [new branch]              zxiiro/build-times          -> origin/zxiiro/build-times
2025-12-04T09:33:42.0189796Z  * [new branch]              zxiiro/c7i.2xlarge          -> origin/zxiiro/c7i.2xlarge
2025-12-04T09:33:42.0191116Z  * [new branch]              zxiiro/c7i.2xlarge.h100     -> origin/zxiiro/c7i.2xlarge.h100
2025-12-04T09:33:42.0192365Z  * [new branch]              zxiiro/main                 -> origin/zxiiro/main
2025-12-04T09:33:42.0193632Z  * [new branch]              zxiiro/risc64               -> origin/zxiiro/risc64
2025-12-04T09:33:42.0195065Z  * [new branch]              zxiiro/test-multicloud-arc  -> origin/zxiiro/test-multicloud-arc
2025-12-04T09:33:42.0196412Z  * [new tag]                 bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug -> bc2caa7fdf006894eff7af936babde69ab5a40f8-huydhn-debug
2025-12-04T09:33:42.0197246Z  * [new tag]                 ci/binaries/77164           -> ci/binaries/77164
2025-12-04T09:33:42.0198491Z  * [new tag]                 ciflow/b200/115316          -> ciflow/b200/115316
2025-12-04T09:33:42.0199235Z  * [new tag]                 ciflow/b200/160685          -> ciflow/b200/160685
2025-12-04T09:33:42.0200088Z  * [new tag]                 ciflow/b200/161607          -> ciflow/b200/161607
2025-12-04T09:33:42.0201004Z  * [new tag]                 ciflow/b200/161938          -> ciflow/b200/161938
2025-12-04T09:33:42.0202370Z  * [new tag]                 ciflow/b200/167207          -> ciflow/b200/167207
2025-12-04T09:33:42.0203136Z  * [new tag]                 ciflow/b200/167989          -> ciflow/b200/167989
2025-12-04T09:33:42.0204198Z  * [new tag]                 ciflow/b200/168096          -> ciflow/b200/168096
2025-12-04T09:33:42.0205166Z  * [new tag]                 ciflow/b200/168175          -> ciflow/b200/168175
2025-12-04T09:33:42.0206023Z  * [new tag]                 ciflow/b200/168195          -> ciflow/b200/168195
2025-12-04T09:33:42.0206870Z  * [new tag]                 ciflow/b200/169200          -> ciflow/b200/169200
2025-12-04T09:33:42.0207925Z  * [new tag]                 ciflow/b200/169216          -> ciflow/b200/169216
2025-12-04T09:33:42.0209338Z  * [new tag]                 ciflow/b200/169380          -> ciflow/b200/169380
2025-12-04T09:33:42.0210747Z  * [new tag]                 ciflow/b200/169412          -> ciflow/b200/169412
2025-12-04T09:33:42.0211605Z  * [new tag]                 ciflow/b200/169470          -> ciflow/b200/169470
2025-12-04T09:33:42.0212711Z  * [new tag]                 ciflow/b200/169471          -> ciflow/b200/169471
2025-12-04T09:33:42.0213463Z  * [new tag]                 ciflow/b200/169472          -> ciflow/b200/169472
2025-12-04T09:33:42.0214709Z  * [new tag]                 ciflow/b200/169514          -> ciflow/b200/169514
2025-12-04T09:33:42.0215442Z  * [new tag]                 ciflow/b200/169517          -> ciflow/b200/169517
2025-12-04T09:33:42.0216713Z  * [new tag]                 ciflow/binaries/165922      -> ciflow/binaries/165922
2025-12-04T09:33:42.0217450Z  * [new tag]                 ciflow/binaries/169510      -> ciflow/binaries/169510
2025-12-04T09:33:42.0218725Z  * [new tag]                 ciflow/binaries_wheel/157994 -> ciflow/binaries_wheel/157994
2025-12-04T09:33:42.0219555Z  * [new tag]                 ciflow/binaries_wheel/166829 -> ciflow/binaries_wheel/166829
2025-12-04T09:33:42.0220369Z  * [new tag]                 ciflow/binaries_wheel/167972 -> ciflow/binaries_wheel/167972
2025-12-04T09:33:42.0221955Z  * [new tag]                 ciflow/binaries_wheel/167981 -> ciflow/binaries_wheel/167981
2025-12-04T09:33:42.0222573Z  * [new tag]                 ciflow/dynamo/167695        -> ciflow/dynamo/167695
2025-12-04T09:33:42.0223438Z  * [new tag]                 ciflow/dynamo/168096        -> ciflow/dynamo/168096
2025-12-04T09:33:42.0224459Z  * [new tag]                 ciflow/dynamo/169525        -> ciflow/dynamo/169525
2025-12-04T09:33:42.0225538Z  * [new tag]                 ciflow/h100-cutlass-backend/161938 -> ciflow/h100-cutlass-backend/161938
2025-12-04T09:33:42.0226301Z  * [new tag]                 ciflow/h100-cutlass-backend/161940 -> ciflow/h100-cutlass-backend/161940
2025-12-04T09:33:42.0227453Z  * [new tag]                 ciflow/h100-distributed/168923 -> ciflow/h100-distributed/168923
2025-12-04T09:33:42.0229026Z  * [new tag]                 ciflow/h100-symm-mem/167552 -> ciflow/h100-symm-mem/167552
2025-12-04T09:33:42.0229820Z  * [new tag]                 ciflow/h100-symm-mem/168129 -> ciflow/h100-symm-mem/168129
2025-12-04T09:33:42.0230591Z  * [new tag]                 ciflow/h100-symm-mem/168917 -> ciflow/h100-symm-mem/168917
2025-12-04T09:33:42.0231885Z  * [new tag]                 ciflow/h100-symm-mem/169156 -> ciflow/h100-symm-mem/169156
2025-12-04T09:33:42.0232668Z  * [new tag]                 ciflow/h100-symm-mem/169200 -> ciflow/h100-symm-mem/169200
2025-12-04T09:33:42.0233489Z  * [new tag]                 ciflow/h100-symm-mem/169216 -> ciflow/h100-symm-mem/169216
2025-12-04T09:33:42.0234321Z  * [new tag]                 ciflow/h100-symm-mem/169338 -> ciflow/h100-symm-mem/169338
2025-12-04T09:33:42.0235386Z  * [new tag]                 ciflow/h100-symm-mem/169355 -> ciflow/h100-symm-mem/169355
2025-12-04T09:33:42.0236076Z  * [new tag]                 ciflow/h100-symm-mem/169543 -> ciflow/h100-symm-mem/169543
2025-12-04T09:33:42.0237235Z  * [new tag]                 ciflow/h100/115316          -> ciflow/h100/115316
2025-12-04T09:33:42.0237930Z  * [new tag]                 ciflow/h100/160685          -> ciflow/h100/160685
2025-12-04T09:33:42.0239541Z  * [new tag]                 ciflow/h100/160729          -> ciflow/h100/160729
2025-12-04T09:33:42.0240283Z  * [new tag]                 ciflow/h100/161607          -> ciflow/h100/161607
2025-12-04T09:33:42.0240660Z  * [new tag]                 ciflow/h100/161938          -> ciflow/h100/161938
2025-12-04T09:33:42.0241598Z  * [new tag]                 ciflow/h100/167207          -> ciflow/h100/167207
2025-12-04T09:33:42.0242294Z  * [new tag]                 ciflow/h100/167989          -> ciflow/h100/167989
2025-12-04T09:33:42.0243375Z  * [new tag]                 ciflow/h100/168096          -> ciflow/h100/168096
2025-12-04T09:33:42.0243990Z  * [new tag]                 ciflow/h100/168175          -> ciflow/h100/168175
2025-12-04T09:33:42.0244776Z  * [new tag]                 ciflow/h100/168195          -> ciflow/h100/168195
2025-12-04T09:33:42.0245624Z  * [new tag]                 ciflow/h100/168980          -> ciflow/h100/168980
2025-12-04T09:33:42.0247171Z  * [new tag]                 ciflow/h100/169200          -> ciflow/h100/169200
2025-12-04T09:33:42.0247955Z  * [new tag]                 ciflow/h100/169216          -> ciflow/h100/169216
2025-12-04T09:33:42.0249089Z  * [new tag]                 ciflow/h100/169380          -> ciflow/h100/169380
2025-12-04T09:33:42.0249850Z  * [new tag]                 ciflow/h100/169412          -> ciflow/h100/169412
2025-12-04T09:33:42.0250717Z  * [new tag]                 ciflow/h100/169470          -> ciflow/h100/169470
2025-12-04T09:33:42.0251668Z  * [new tag]                 ciflow/h100/169471          -> ciflow/h100/169471
2025-12-04T09:33:42.0252451Z  * [new tag]                 ciflow/h100/169472          -> ciflow/h100/169472
2025-12-04T09:33:42.0253279Z  * [new tag]                 ciflow/h100/169514          -> ciflow/h100/169514
2025-12-04T09:33:42.0254460Z  * [new tag]                 ciflow/inductor-cu126/168096 -> ciflow/inductor-cu126/168096
2025-12-04T09:33:42.0255870Z  * [new tag]                 ciflow/inductor-micro-benchmark-cpu-x86/168096 -> ciflow/inductor-micro-benchmark-cpu-x86/168096
2025-12-04T09:33:42.0256674Z  * [new tag]                 ciflow/inductor-micro-benchmark/166165 -> ciflow/inductor-micro-benchmark/166165
2025-12-04T09:33:42.0257538Z  * [new tag]                 ciflow/inductor-micro-benchmark/168096 -> ciflow/inductor-micro-benchmark/168096
2025-12-04T09:33:42.0258473Z  * [new tag]                 ciflow/inductor-perf-compare/168096 -> ciflow/inductor-perf-compare/168096
2025-12-04T09:33:42.0260087Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168073 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168073
2025-12-04T09:33:42.0261371Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/168096 -> ciflow/inductor-perf-test-nightly-rocm-mi300/168096
2025-12-04T09:33:42.0262312Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi300/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi300/169024
2025-12-04T09:33:42.0263387Z  * [new tag]                 ciflow/inductor-perf-test-nightly-rocm-mi355/169024 -> ciflow/inductor-perf-test-nightly-rocm-mi355/169024
2025-12-04T09:33:42.0264294Z  * [new tag]                 ciflow/inductor-perf-test-nightly/168096 -> ciflow/inductor-perf-test-nightly/168096
2025-12-04T09:33:42.0265186Z  * [new tag]                 ciflow/inductor-periodic/168096 -> ciflow/inductor-periodic/168096
2025-12-04T09:33:42.0265977Z  * [new tag]                 ciflow/inductor-periodic/169024 -> ciflow/inductor-periodic/169024
2025-12-04T09:33:42.0266946Z  * [new tag]                 ciflow/inductor-periodic/169425 -> ciflow/inductor-periodic/169425
2025-12-04T09:33:42.0268211Z  * [new tag]                 ciflow/inductor-rocm-mi200/165545 -> ciflow/inductor-rocm-mi200/165545
2025-12-04T09:33:42.0269065Z  * [new tag]                 ciflow/inductor-rocm-mi200/165997 -> ciflow/inductor-rocm-mi200/165997
2025-12-04T09:33:42.0269865Z  * [new tag]                 ciflow/inductor-rocm-mi200/168096 -> ciflow/inductor-rocm-mi200/168096
2025-12-04T09:33:42.0270810Z  * [new tag]                 ciflow/inductor-rocm-mi200/169063 -> ciflow/inductor-rocm-mi200/169063
2025-12-04T09:33:42.0271625Z  * [new tag]                 ciflow/inductor-rocm-mi200/169425 -> ciflow/inductor-rocm-mi200/169425
2025-12-04T09:33:42.0272838Z  * [new tag]                 ciflow/inductor-rocm-mi300/165545 -> ciflow/inductor-rocm-mi300/165545
2025-12-04T09:33:42.0273477Z  * [new tag]                 ciflow/inductor-rocm-mi300/168096 -> ciflow/inductor-rocm-mi300/168096
2025-12-04T09:33:42.0274289Z  * [new tag]                 ciflow/inductor-rocm-mi300/169063 -> ciflow/inductor-rocm-mi300/169063
2025-12-04T09:33:42.0275079Z  * [new tag]                 ciflow/inductor-rocm-mi300/169425 -> ciflow/inductor-rocm-mi300/169425
2025-12-04T09:33:42.0276276Z  * [new tag]                 ciflow/inductor-rocm/162052 -> ciflow/inductor-rocm/162052
2025-12-04T09:33:42.0277010Z  * [new tag]                 ciflow/inductor-rocm/168971 -> ciflow/inductor-rocm/168971
2025-12-04T09:33:42.0278109Z  * [new tag]                 ciflow/inductor-windows/168096 -> ciflow/inductor-windows/168096
2025-12-04T09:33:42.0278964Z  * [new tag]                 ciflow/inductor/144542      -> ciflow/inductor/144542
2025-12-04T09:33:42.0279747Z  * [new tag]                 ciflow/inductor/146506      -> ciflow/inductor/146506
2025-12-04T09:33:42.0280580Z  * [new tag]                 ciflow/inductor/147990      -> ciflow/inductor/147990
2025-12-04T09:33:42.0281590Z  * [new tag]                 ciflow/inductor/148294      -> ciflow/inductor/148294
2025-12-04T09:33:42.0282368Z  * [new tag]                 ciflow/inductor/148492      -> ciflow/inductor/148492
2025-12-04T09:33:42.0283331Z  * [new tag]                 ciflow/inductor/157149      -> ciflow/inductor/157149
2025-12-04T09:33:42.0284159Z  * [new tag]                 ciflow/inductor/157994      -> ciflow/inductor/157994
2025-12-04T09:33:42.0285196Z  * [new tag]                 ciflow/inductor/160685      -> ciflow/inductor/160685
2025-12-04T09:33:42.0285867Z  * [new tag]                 ciflow/inductor/160686      -> ciflow/inductor/160686
2025-12-04T09:33:42.0286698Z  * [new tag]                 ciflow/inductor/160687      -> ciflow/inductor/160687
2025-12-04T09:33:42.0287505Z  * [new tag]                 ciflow/inductor/160688      -> ciflow/inductor/160688
2025-12-04T09:33:42.0288715Z  * [new tag]                 ciflow/inductor/160706      -> ciflow/inductor/160706
2025-12-04T09:33:42.0289900Z  * [new tag]                 ciflow/inductor/160729      -> ciflow/inductor/160729
2025-12-04T09:33:42.0290916Z  * [new tag]                 ciflow/inductor/161938      -> ciflow/inductor/161938
2025-12-04T09:33:42.0291707Z  * [new tag]                 ciflow/inductor/161939      -> ciflow/inductor/161939
2025-12-04T09:33:42.0292550Z  * [new tag]                 ciflow/inductor/161940      -> ciflow/inductor/161940
2025-12-04T09:33:42.0293408Z  * [new tag]                 ciflow/inductor/162052      -> ciflow/inductor/162052
2025-12-04T09:33:42.0294264Z  * [new tag]                 ciflow/inductor/162275      -> ciflow/inductor/162275
2025-12-04T09:33:42.0295109Z  * [new tag]                 ciflow/inductor/162795      -> ciflow/inductor/162795
2025-12-04T09:33:42.0296375Z  * [new tag]                 ciflow/inductor/163245      -> ciflow/inductor/163245
2025-12-04T09:33:42.0297117Z  * [new tag]                 ciflow/inductor/163335      -> ciflow/inductor/163335
2025-12-04T09:33:42.0297978Z  * [new tag]                 ciflow/inductor/163503      -> ciflow/inductor/163503
2025-12-04T09:33:42.0298817Z  * [new tag]                 ciflow/inductor/163942      -> ciflow/inductor/163942
2025-12-04T09:33:42.0300016Z  * [new tag]                 ciflow/inductor/165270      -> ciflow/inductor/165270
2025-12-04T09:33:42.0300728Z  * [new tag]                 ciflow/inductor/165274      -> ciflow/inductor/165274
2025-12-04T09:33:42.0301893Z  * [new tag]                 ciflow/inductor/165322      -> ciflow/inductor/165322
2025-12-04T09:33:42.0302646Z  * [new tag]                 ciflow/inductor/165597      -> ciflow/inductor/165597
2025-12-04T09:33:42.0303615Z  * [new tag]                 ciflow/inductor/166063      -> ciflow/inductor/166063
2025-12-04T09:33:42.0304393Z  * [new tag]                 ciflow/inductor/166075      -> ciflow/inductor/166075
2025-12-04T09:33:42.0305371Z  * [new tag]                 ciflow/inductor/166165      -> ciflow/inductor/166165
2025-12-04T09:33:42.0306462Z  * [new tag]                 ciflow/inductor/166254      -> ciflow/inductor/166254
2025-12-04T09:33:42.0307255Z  * [new tag]                 ciflow/inductor/166483      -> ciflow/inductor/166483
2025-12-04T09:33:42.0308079Z  * [new tag]                 ciflow/inductor/166494      -> ciflow/inductor/166494
2025-12-04T09:33:42.0308947Z  * [new tag]                 ciflow/inductor/166545      -> ciflow/inductor/166545
2025-12-04T09:33:42.0309798Z  * [new tag]                 ciflow/inductor/166788      -> ciflow/inductor/166788
2025-12-04T09:33:42.0310926Z  * [new tag]                 ciflow/inductor/166846      -> ciflow/inductor/166846
2025-12-04T09:33:42.0311675Z  * [new tag]                 ciflow/inductor/167300      -> ciflow/inductor/167300
2025-12-04T09:33:42.0312525Z  * [new tag]                 ciflow/inductor/167407      -> ciflow/inductor/167407
2025-12-04T09:33:42.0313620Z  * [new tag]                 ciflow/inductor/167536      -> ciflow/inductor/167536
2025-12-04T09:33:42.0314478Z  * [new tag]                 ciflow/inductor/167552      -> ciflow/inductor/167552
2025-12-04T09:33:42.0315340Z  * [new tag]                 ciflow/inductor/167555      -> ciflow/inductor/167555
2025-12-04T09:33:42.0316420Z  * [new tag]                 ciflow/inductor/167583      -> ciflow/inductor/167583
2025-12-04T09:33:42.0317198Z  * [new tag]                 ciflow/inductor/167599      -> ciflow/inductor/167599
2025-12-04T09:33:42.0318067Z  * [new tag]                 ciflow/inductor/167647      -> ciflow/inductor/167647
2025-12-04T09:33:42.0318914Z  * [new tag]                 ciflow/inductor/167677      -> ciflow/inductor/167677
2025-12-04T09:33:42.0319787Z  * [new tag]                 ciflow/inductor/167680      -> ciflow/inductor/167680
2025-12-04T09:33:42.0320619Z  * [new tag]                 ciflow/inductor/167695      -> ciflow/inductor/167695
2025-12-04T09:33:42.0321496Z  * [new tag]                 ciflow/inductor/167742      -> ciflow/inductor/167742
2025-12-04T09:33:42.0322425Z  * [new tag]                 ciflow/inductor/167768      -> ciflow/inductor/167768
2025-12-04T09:33:42.0323678Z  * [new tag]                 ciflow/inductor/167773      -> ciflow/inductor/167773
2025-12-04T09:33:42.0324505Z  * [new tag]                 ciflow/inductor/167781      -> ciflow/inductor/167781
2025-12-04T09:33:42.0325357Z  * [new tag]                 ciflow/inductor/167880      -> ciflow/inductor/167880
2025-12-04T09:33:42.0326216Z  * [new tag]                 ciflow/inductor/167887      -> ciflow/inductor/167887
2025-12-04T09:33:42.0327061Z  * [new tag]                 ciflow/inductor/167972      -> ciflow/inductor/167972
2025-12-04T09:33:42.0328543Z  * [new tag]                 ciflow/inductor/167989      -> ciflow/inductor/167989
2025-12-04T09:33:42.0329292Z  * [new tag]                 ciflow/inductor/168002      -> ciflow/inductor/168002
2025-12-04T09:33:42.0330149Z  * [new tag]                 ciflow/inductor/168050      -> ciflow/inductor/168050
2025-12-04T09:33:42.0331022Z  * [new tag]                 ciflow/inductor/168051      -> ciflow/inductor/168051
2025-12-04T09:33:42.0331867Z  * [new tag]                 ciflow/inductor/168052      -> ciflow/inductor/168052
2025-12-04T09:33:42.0332846Z  * [new tag]                 ciflow/inductor/168073      -> ciflow/inductor/168073
2025-12-04T09:33:42.0333598Z  * [new tag]                 ciflow/inductor/168096      -> ciflow/inductor/168096
2025-12-04T09:33:42.0334466Z  * [new tag]                 ciflow/inductor/168114      -> ciflow/inductor/168114
2025-12-04T09:33:42.0335301Z  * [new tag]                 ciflow/inductor/168115      -> ciflow/inductor/168115
2025-12-04T09:33:42.0336168Z  * [new tag]                 ciflow/inductor/168127      -> ciflow/inductor/168127
2025-12-04T09:33:42.0337018Z  * [new tag]                 ciflow/inductor/168129      -> ciflow/inductor/168129
2025-12-04T09:33:42.0337853Z  * [new tag]                 ciflow/inductor/168157      -> ciflow/inductor/168157
2025-12-04T09:33:42.0338789Z  * [new tag]                 ciflow/inductor/168175      -> ciflow/inductor/168175
2025-12-04T09:33:42.0339562Z  * [new tag]                 ciflow/inductor/168185      -> ciflow/inductor/168185
2025-12-04T09:33:42.0340410Z  * [new tag]                 ciflow/inductor/168195      -> ciflow/inductor/168195
2025-12-04T09:33:42.0341276Z  * [new tag]                 ciflow/inductor/168209      -> ciflow/inductor/168209
2025-12-04T09:33:42.0342261Z  * [new tag]                 ciflow/inductor/168266      -> ciflow/inductor/168266
2025-12-04T09:33:42.0343283Z  * [new tag]                 ciflow/inductor/168316      -> ciflow/inductor/168316
2025-12-04T09:33:42.0344315Z  * [new tag]                 ciflow/inductor/168326      -> ciflow/inductor/168326
2025-12-04T09:33:42.0345106Z  * [new tag]                 ciflow/inductor/168368      -> ciflow/inductor/168368
2025-12-04T09:33:42.0345963Z  * [new tag]                 ciflow/inductor/168894      -> ciflow/inductor/168894
2025-12-04T09:33:42.0346831Z  * [new tag]                 ciflow/inductor/168934      -> ciflow/inductor/168934
2025-12-04T09:33:42.0347693Z  * [new tag]                 ciflow/inductor/168939      -> ciflow/inductor/168939
2025-12-04T09:33:42.0348554Z  * [new tag]                 ciflow/inductor/168946      -> ciflow/inductor/168946
2025-12-04T09:33:42.0349417Z  * [new tag]                 ciflow/inductor/168950      -> ciflow/inductor/168950
2025-12-04T09:33:42.0350397Z  * [new tag]                 ciflow/inductor/168951      -> ciflow/inductor/168951
2025-12-04T09:33:42.0351207Z  * [new tag]                 ciflow/inductor/168952      -> ciflow/inductor/168952
2025-12-04T09:33:42.0352061Z  * [new tag]                 ciflow/inductor/168955      -> ciflow/inductor/168955
2025-12-04T09:33:42.0353101Z  * [new tag]                 ciflow/inductor/168971      -> ciflow/inductor/168971
2025-12-04T09:33:42.0353761Z  * [new tag]                 ciflow/inductor/168979      -> ciflow/inductor/168979
2025-12-04T09:33:42.0354637Z  * [new tag]                 ciflow/inductor/168980      -> ciflow/inductor/168980
2025-12-04T09:33:42.0355797Z  * [new tag]                 ciflow/inductor/168983      -> ciflow/inductor/168983
2025-12-04T09:33:42.0356554Z  * [new tag]                 ciflow/inductor/169006      -> ciflow/inductor/169006
2025-12-04T09:33:42.0357446Z  * [new tag]                 ciflow/inductor/169023      -> ciflow/inductor/169023
2025-12-04T09:33:42.0358295Z  * [new tag]                 ciflow/inductor/169024      -> ciflow/inductor/169024
2025-12-04T09:33:42.0359155Z  * [new tag]                 ciflow/inductor/169025      -> ciflow/inductor/169025
2025-12-04T09:33:42.0360159Z  * [new tag]                 ciflow/inductor/169066      -> ciflow/inductor/169066
2025-12-04T09:33:42.0360919Z  * [new tag]                 ciflow/inductor/169091      -> ciflow/inductor/169091
2025-12-04T09:33:42.0361787Z  * [new tag]                 ciflow/inductor/169102      -> ciflow/inductor/169102
2025-12-04T09:33:42.0362737Z  * [new tag]                 ciflow/inductor/169103      -> ciflow/inductor/169103
2025-12-04T09:33:42.0363746Z  * [new tag]                 ciflow/inductor/169121      -> ciflow/inductor/169121
2025-12-04T09:33:42.0364483Z  * [new tag]                 ciflow/inductor/169134      -> ciflow/inductor/169134
2025-12-04T09:33:42.0365332Z  * [new tag]                 ciflow/inductor/169135      -> ciflow/inductor/169135
2025-12-04T09:33:42.0366382Z  * [new tag]                 ciflow/inductor/169141      -> ciflow/inductor/169141
2025-12-04T09:33:42.0367097Z  * [new tag]                 ciflow/inductor/169151      -> ciflow/inductor/169151
2025-12-04T09:33:42.0368185Z  * [new tag]                 ciflow/inductor/169161      -> ciflow/inductor/169161
2025-12-04T09:33:42.0368945Z  * [new tag]                 ciflow/inductor/169167      -> ciflow/inductor/169167
2025-12-04T09:33:42.0370096Z  * [new tag]                 ciflow/inductor/169177      -> ciflow/inductor/169177
2025-12-04T09:33:42.0371207Z  * [new tag]                 ciflow/inductor/169185      -> ciflow/inductor/169185
2025-12-04T09:33:42.0371986Z  * [new tag]                 ciflow/inductor/169196      -> ciflow/inductor/169196
2025-12-04T09:33:42.0372994Z  * [new tag]                 ciflow/inductor/169200      -> ciflow/inductor/169200
2025-12-04T09:33:42.0373765Z  * [new tag]                 ciflow/inductor/169204      -> ciflow/inductor/169204
2025-12-04T09:33:42.0374638Z  * [new tag]                 ciflow/inductor/169216      -> ciflow/inductor/169216
2025-12-04T09:33:42.0375475Z  * [new tag]                 ciflow/inductor/169219      -> ciflow/inductor/169219
2025-12-04T09:33:42.0376326Z  * [new tag]                 ciflow/inductor/169220      -> ciflow/inductor/169220
2025-12-04T09:33:42.0377462Z  * [new tag]                 ciflow/inductor/169230      -> ciflow/inductor/169230
2025-12-04T09:33:42.0378200Z  * [new tag]                 ciflow/inductor/169242      -> ciflow/inductor/169242
2025-12-04T09:33:42.0379076Z  * [new tag]                 ciflow/inductor/169245      -> ciflow/inductor/169245
2025-12-04T09:33:42.0380180Z  * [new tag]                 ciflow/inductor/169260      -> ciflow/inductor/169260
2025-12-04T09:33:42.0380968Z  * [new tag]                 ciflow/inductor/169282      -> ciflow/inductor/169282
2025-12-04T09:33:42.0381809Z  * [new tag]                 ciflow/inductor/169286      -> ciflow/inductor/169286
2025-12-04T09:33:42.0382692Z  * [new tag]                 ciflow/inductor/169299      -> ciflow/inductor/169299
2025-12-04T09:33:42.0383834Z  * [new tag]                 ciflow/inductor/169304      -> ciflow/inductor/169304
2025-12-04T09:33:42.0385152Z  * [new tag]                 ciflow/inductor/169305      -> ciflow/inductor/169305
2025-12-04T09:33:42.0386503Z  * [new tag]                 ciflow/inductor/169308      -> ciflow/inductor/169308
2025-12-04T09:33:42.0387271Z  * [new tag]                 ciflow/inductor/169319      -> ciflow/inductor/169319
2025-12-04T09:33:42.0388151Z  * [new tag]                 ciflow/inductor/169326      -> ciflow/inductor/169326
2025-12-04T09:33:42.0389025Z  * [new tag]                 ciflow/inductor/169332      -> ciflow/inductor/169332
2025-12-04T09:33:42.0389897Z  * [new tag]                 ciflow/inductor/169333      -> ciflow/inductor/169333
2025-12-04T09:33:42.0391123Z  * [new tag]                 ciflow/inductor/169336      -> ciflow/inductor/169336
2025-12-04T09:33:42.0391889Z  * [new tag]                 ciflow/inductor/169340      -> ciflow/inductor/169340
2025-12-04T09:33:42.0392924Z  * [new tag]                 ciflow/inductor/169341      -> ciflow/inductor/169341
2025-12-04T09:33:42.0393688Z  * [new tag]                 ciflow/inductor/169343      -> ciflow/inductor/169343
2025-12-04T09:33:42.0394557Z  * [new tag]                 ciflow/inductor/169346      -> ciflow/inductor/169346
2025-12-04T09:33:42.0395697Z  * [new tag]                 ciflow/inductor/169348      -> ciflow/inductor/169348
2025-12-04T09:33:42.0396696Z  * [new tag]                 ciflow/inductor/169350      -> ciflow/inductor/169350
2025-12-04T09:33:42.0397507Z  * [new tag]                 ciflow/inductor/169355      -> ciflow/inductor/169355
2025-12-04T09:33:42.0398366Z  * [new tag]                 ciflow/inductor/169370      -> ciflow/inductor/169370
2025-12-04T09:33:42.0399675Z  * [new tag]                 ciflow/inductor/169375      -> ciflow/inductor/169375
2025-12-04T09:33:42.0400450Z  * [new tag]                 ciflow/inductor/169389      -> ciflow/inductor/169389
2025-12-04T09:33:42.0404492Z  * [new tag]                 ciflow/inductor/169391      -> ciflow/inductor/169391
2025-12-04T09:33:42.0405643Z  * [new tag]                 ciflow/inductor/169393      -> ciflow/inductor/169393
2025-12-04T09:33:42.0406481Z  * [new tag]                 ciflow/inductor/169399      -> ciflow/inductor/169399
2025-12-04T09:33:42.0407673Z  * [new tag]                 ciflow/inductor/169400      -> ciflow/inductor/169400
2025-12-04T09:33:42.0408438Z  * [new tag]                 ciflow/inductor/169415      -> ciflow/inductor/169415
2025-12-04T09:33:42.0409510Z  * [new tag]                 ciflow/inductor/169417      -> ciflow/inductor/169417
2025-12-04T09:33:42.0410198Z  * [new tag]                 ciflow/inductor/169418      -> ciflow/inductor/169418
2025-12-04T09:33:42.0411424Z  * [new tag]                 ciflow/inductor/169430      -> ciflow/inductor/169430
2025-12-04T09:33:42.0412212Z  * [new tag]                 ciflow/inductor/169432      -> ciflow/inductor/169432
2025-12-04T09:33:42.0413056Z  * [new tag]                 ciflow/inductor/169436      -> ciflow/inductor/169436
2025-12-04T09:33:42.0414165Z  * [new tag]                 ciflow/inductor/169437      -> ciflow/inductor/169437
2025-12-04T09:33:42.0414948Z  * [new tag]                 ciflow/inductor/169438      -> ciflow/inductor/169438
2025-12-04T09:33:42.0415815Z  * [new tag]                 ciflow/inductor/169441      -> ciflow/inductor/169441
2025-12-04T09:33:42.0416798Z  * [new tag]                 ciflow/inductor/169446      -> ciflow/inductor/169446
2025-12-04T09:33:42.0417753Z  * [new tag]                 ciflow/inductor/169447      -> ciflow/inductor/169447
2025-12-04T09:33:42.0418592Z  * [new tag]                 ciflow/inductor/169452      -> ciflow/inductor/169452
2025-12-04T09:33:42.0419659Z  * [new tag]                 ciflow/inductor/169455      -> ciflow/inductor/169455
2025-12-04T09:33:42.0420472Z  * [new tag]                 ciflow/inductor/169459      -> ciflow/inductor/169459
2025-12-04T09:33:42.0421549Z  * [new tag]                 ciflow/inductor/169463      -> ciflow/inductor/169463
2025-12-04T09:33:42.0422633Z  * [new tag]                 ciflow/inductor/169476      -> ciflow/inductor/169476
2025-12-04T09:33:42.0423393Z  * [new tag]                 ciflow/inductor/169485      -> ciflow/inductor/169485
2025-12-04T09:33:42.0424344Z  * [new tag]                 ciflow/inductor/169493      -> ciflow/inductor/169493
2025-12-04T09:33:42.0425145Z  * [new tag]                 ciflow/inductor/169496      -> ciflow/inductor/169496
2025-12-04T09:33:42.0426101Z  * [new tag]                 ciflow/inductor/169497      -> ciflow/inductor/169497
2025-12-04T09:33:42.0426895Z  * [new tag]                 ciflow/inductor/169503      -> ciflow/inductor/169503
2025-12-04T09:33:42.0427748Z  * [new tag]                 ciflow/inductor/169504      -> ciflow/inductor/169504
2025-12-04T09:33:42.0429295Z  * [new tag]                 ciflow/inductor/169505      -> ciflow/inductor/169505
2025-12-04T09:33:42.0430825Z  * [new tag]                 ciflow/inductor/169508      -> ciflow/inductor/169508
2025-12-04T09:33:42.0431615Z  * [new tag]                 ciflow/inductor/169509      -> ciflow/inductor/169509
2025-12-04T09:33:42.0432647Z  * [new tag]                 ciflow/inductor/169513      -> ciflow/inductor/169513
2025-12-04T09:33:42.0433432Z  * [new tag]                 ciflow/inductor/169514      -> ciflow/inductor/169514
2025-12-04T09:33:42.0434444Z  * [new tag]                 ciflow/inductor/169515      -> ciflow/inductor/169515
2025-12-04T09:33:42.0435188Z  * [new tag]                 ciflow/inductor/169517      -> ciflow/inductor/169517
2025-12-04T09:33:42.0436240Z  * [new tag]                 ciflow/inductor/169519      -> ciflow/inductor/169519
2025-12-04T09:33:42.0437068Z  * [new tag]                 ciflow/inductor/169520      -> ciflow/inductor/169520
2025-12-04T09:33:42.0438050Z  * [new tag]                 ciflow/inductor/169521      -> ciflow/inductor/169521
2025-12-04T09:33:42.0438825Z  * [new tag]                 ciflow/inductor/169524      -> ciflow/inductor/169524
2025-12-04T09:33:42.0439683Z  * [new tag]                 ciflow/inductor/169527      -> ciflow/inductor/169527
2025-12-04T09:33:42.0440643Z  * [new tag]                 ciflow/inductor/169528      -> ciflow/inductor/169528
2025-12-04T09:33:42.0441684Z  * [new tag]                 ciflow/inductor/169532      -> ciflow/inductor/169532
2025-12-04T09:33:42.0442544Z  * [new tag]                 ciflow/inductor/169535      -> ciflow/inductor/169535
2025-12-04T09:33:42.0443593Z  * [new tag]                 ciflow/inductor/169536      -> ciflow/inductor/169536
2025-12-04T09:33:42.0444456Z  * [new tag]                 ciflow/inductor/169547      -> ciflow/inductor/169547
2025-12-04T09:33:42.0445289Z  * [new tag]                 ciflow/inductor/169548      -> ciflow/inductor/169548
2025-12-04T09:33:42.0446160Z  * [new tag]                 ciflow/inductor/169549      -> ciflow/inductor/169549
2025-12-04T09:33:42.0447062Z  * [new tag]                 ciflow/inductor/169551      -> ciflow/inductor/169551
2025-12-04T09:33:42.0447939Z  * [new tag]                 ciflow/inductor/169552      -> ciflow/inductor/169552
2025-12-04T09:33:42.0448814Z  * [new tag]                 ciflow/inductor/169553      -> ciflow/inductor/169553
2025-12-04T09:33:42.0450271Z  * [new tag]                 ciflow/inductor/169557      -> ciflow/inductor/169557
2025-12-04T09:33:42.0451387Z  * [new tag]                 ciflow/inductor/3b9a386     -> ciflow/inductor/3b9a386
2025-12-04T09:33:42.0452491Z  * [new tag]                 ciflow/inductor/3d4b92b     -> ciflow/inductor/3d4b92b
2025-12-04T09:33:42.0453532Z  * [new tag]                 ciflow/inductor/d224ac7     -> ciflow/inductor/d224ac7
2025-12-04T09:33:42.0454586Z  * [new tag]                 ciflow/linux-aarch64/157994 -> ciflow/linux-aarch64/157994
2025-12-04T09:33:42.0455326Z  * [new tag]                 ciflow/linux-aarch64/166075 -> ciflow/linux-aarch64/166075
2025-12-04T09:33:42.0456171Z  * [new tag]                 ciflow/linux-aarch64/166876 -> ciflow/linux-aarch64/166876
2025-12-04T09:33:42.0456968Z  * [new tag]                 ciflow/linux-aarch64/167981 -> ciflow/linux-aarch64/167981
2025-12-04T09:33:42.0458019Z  * [new tag]                 ciflow/mps/166254           -> ciflow/mps/166254
2025-12-04T09:33:42.0458766Z  * [new tag]                 ciflow/mps/169017           -> ciflow/mps/169017
2025-12-04T09:33:42.0459868Z  * [new tag]                 ciflow/mps/169372           -> ciflow/mps/169372
2025-12-04T09:33:42.0460591Z  * [new tag]                 ciflow/mps/169478           -> ciflow/mps/169478
2025-12-04T09:33:42.0461743Z  * [new tag]                 ciflow/op-benchmark/157994  -> ciflow/op-benchmark/157994
2025-12-04T09:33:42.0462506Z  * [new tag]                 ciflow/op-benchmark/166075  -> ciflow/op-benchmark/166075
2025-12-04T09:33:42.0463560Z  * [new tag]                 ciflow/op-benchmark/169544  -> ciflow/op-benchmark/169544
2025-12-04T09:33:42.0464512Z  * [new tag]                 ciflow/periodic-rocm-mi200/165997 -> ciflow/periodic-rocm-mi200/165997
2025-12-04T09:33:42.0465478Z  * [new tag]                 ciflow/periodic-rocm-mi200/166517 -> ciflow/periodic-rocm-mi200/166517
2025-12-04T09:33:42.0466271Z  * [new tag]                 ciflow/periodic-rocm-mi200/169063 -> ciflow/periodic-rocm-mi200/169063
2025-12-04T09:33:42.0467150Z  * [new tag]                 ciflow/periodic-rocm-mi200/169425 -> ciflow/periodic-rocm-mi200/169425
2025-12-04T09:33:42.0468113Z  * [new tag]                 ciflow/periodic-rocm-mi300/166517 -> ciflow/periodic-rocm-mi300/166517
2025-12-04T09:33:42.0468924Z  * [new tag]                 ciflow/periodic-rocm-mi300/169063 -> ciflow/periodic-rocm-mi300/169063
2025-12-04T09:33:42.0469741Z  * [new tag]                 ciflow/periodic-rocm-mi300/169425 -> ciflow/periodic-rocm-mi300/169425
2025-12-04T09:33:42.0471023Z  * [new tag]                 ciflow/periodic/054a2fd     -> ciflow/periodic/054a2fd
2025-12-04T09:33:42.0471782Z  * [new tag]                 ciflow/periodic/167207      -> ciflow/periodic/167207
2025-12-04T09:33:42.0472761Z  * [new tag]                 ciflow/periodic/167978      -> ciflow/periodic/167978
2025-12-04T09:33:42.0473515Z  * [new tag]                 ciflow/periodic/168096      -> ciflow/periodic/168096
2025-12-04T09:33:42.0474399Z  * [new tag]                 ciflow/periodic/169286      -> ciflow/periodic/169286
2025-12-04T09:33:42.0475545Z  * [new tag]                 ciflow/periodic/2a6d37d     -> ciflow/periodic/2a6d37d
2025-12-04T09:33:42.0476523Z  * [new tag]                 ciflow/periodic/317eeb8     -> ciflow/periodic/317eeb8
2025-12-04T09:33:42.0477470Z  * [new tag]                 ciflow/periodic/3c32        -> ciflow/periodic/3c32
2025-12-04T09:33:42.0478592Z  * [new tag]                 ciflow/periodic/3e98831     -> ciflow/periodic/3e98831
2025-12-04T09:33:42.0480243Z  * [new tag]                 ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> ciflow/periodic/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:33:42.0481249Z  * [new tag]                 ciflow/periodic/94512-point -> ciflow/periodic/94512-point
2025-12-04T09:33:42.0482660Z  * [new tag]                 ciflow/periodic/csl/test87519 -> ciflow/periodic/csl/test87519
2025-12-04T09:33:42.0483777Z  * [new tag]                 ciflow/periodic/csltest88275 -> ciflow/periodic/csltest88275
2025-12-04T09:33:42.0484697Z  * [new tag]                 ciflow/periodic/csltest88761 -> ciflow/periodic/csltest88761
2025-12-04T09:33:42.0485872Z  * [new tag]                 ciflow/periodic/release_1.12 -> ciflow/periodic/release_1.12
2025-12-04T09:33:42.0487080Z  * [new tag]                 ciflow/periodic/release_1.12.0 -> ciflow/periodic/release_1.12.0
2025-12-04T09:33:42.0488295Z  * [new tag]                 ciflow/periodic/sha-ec5b83  -> ciflow/periodic/sha-ec5b83
2025-12-04T09:33:42.0489331Z  * [new tag]                 ciflow/pull/167207          -> ciflow/pull/167207
2025-12-04T09:33:42.0490752Z  * [new tag]                 ciflow/quantization-periodic/169207 -> ciflow/quantization-periodic/169207
2025-12-04T09:33:42.0491499Z  * [new tag]                 ciflow/rocm-mi200/165545    -> ciflow/rocm-mi200/165545
2025-12-04T09:33:42.0492344Z  * [new tag]                 ciflow/rocm-mi200/165997    -> ciflow/rocm-mi200/165997
2025-12-04T09:33:42.0493135Z  * [new tag]                 ciflow/rocm-mi200/168096    -> ciflow/rocm-mi200/168096
2025-12-04T09:33:42.0494227Z  * [new tag]                 ciflow/rocm-mi200/168275    -> ciflow/rocm-mi200/168275
2025-12-04T09:33:42.0494935Z  * [new tag]                 ciflow/rocm-mi200/169063    -> ciflow/rocm-mi200/169063
2025-12-04T09:33:42.0496037Z  * [new tag]                 ciflow/rocm-mi200/169356    -> ciflow/rocm-mi200/169356
2025-12-04T09:33:42.0496770Z  * [new tag]                 ciflow/rocm-mi200/169425    -> ciflow/rocm-mi200/169425
2025-12-04T09:33:42.0497837Z  * [new tag]                 ciflow/rocm-mi300/165545    -> ciflow/rocm-mi300/165545
2025-12-04T09:33:42.0498864Z  * [new tag]                 ciflow/rocm-mi300/167157    -> ciflow/rocm-mi300/167157
2025-12-04T09:33:42.0499580Z  * [new tag]                 ciflow/rocm-mi300/168096    -> ciflow/rocm-mi300/168096
2025-12-04T09:33:42.0500394Z  * [new tag]                 ciflow/rocm-mi300/169063    -> ciflow/rocm-mi300/169063
2025-12-04T09:33:42.0501362Z  * [new tag]                 ciflow/rocm-mi300/169425    -> ciflow/rocm-mi300/169425
2025-12-04T09:33:42.0502514Z  * [new tag]                 ciflow/rocm-mi355/167157    -> ciflow/rocm-mi355/167157
2025-12-04T09:33:42.0503302Z  * [new tag]                 ciflow/rocm-mi355/168275    -> ciflow/rocm-mi355/168275
2025-12-04T09:33:42.0504108Z  * [new tag]                 ciflow/rocm-mi355/169425    -> ciflow/rocm-mi355/169425
2025-12-04T09:33:42.0505234Z  * [new tag]                 ciflow/rocm-navi31/168275   -> ciflow/rocm-navi31/168275
2025-12-04T09:33:42.0505909Z  * [new tag]                 ciflow/rocm-navi31/169425   -> ciflow/rocm-navi31/169425
2025-12-04T09:33:42.0506979Z  * [new tag]                 ciflow/rocm/115316          -> ciflow/rocm/115316
2025-12-04T09:33:42.0507722Z  * [new tag]                 ciflow/rocm/148492          -> ciflow/rocm/148492
2025-12-04T09:33:42.0508524Z  * [new tag]                 ciflow/rocm/160685          -> ciflow/rocm/160685
2025-12-04T09:33:42.0509369Z  * [new tag]                 ciflow/rocm/161607          -> ciflow/rocm/161607
2025-12-04T09:33:42.0510147Z  * [new tag]                 ciflow/rocm/162052          -> ciflow/rocm/162052
2025-12-04T09:33:42.0510990Z  * [new tag]                 ciflow/rocm/165997          -> ciflow/rocm/165997
2025-12-04T09:33:42.0511904Z  * [new tag]                 ciflow/rocm/166165          -> ciflow/rocm/166165
2025-12-04T09:33:42.0512631Z  * [new tag]                 ciflow/rocm/166517          -> ciflow/rocm/166517
2025-12-04T09:33:42.0513449Z  * [new tag]                 ciflow/rocm/167207          -> ciflow/rocm/167207
2025-12-04T09:33:42.0514424Z  * [new tag]                 ciflow/rocm/167536          -> ciflow/rocm/167536
2025-12-04T09:33:42.0515185Z  * [new tag]                 ciflow/rocm/167781          -> ciflow/rocm/167781
2025-12-04T09:33:42.0516418Z  * [new tag]                 ciflow/rocm/167989          -> ciflow/rocm/167989
2025-12-04T09:33:42.0517520Z  * [new tag]                 ciflow/rocm/168073          -> ciflow/rocm/168073
2025-12-04T09:33:42.0518570Z  * [new tag]                 ciflow/rocm/168195          -> ciflow/rocm/168195
2025-12-04T09:33:42.0519343Z  * [new tag]                 ciflow/rocm/168939          -> ciflow/rocm/168939
2025-12-04T09:33:42.0520343Z  * [new tag]                 ciflow/rocm/168971          -> ciflow/rocm/168971
2025-12-04T09:33:42.0521088Z  * [new tag]                 ciflow/rocm/169024          -> ciflow/rocm/169024
2025-12-04T09:33:42.0521947Z  * [new tag]                 ciflow/rocm/169200          -> ciflow/rocm/169200
2025-12-04T09:33:42.0523021Z  * [new tag]                 ciflow/rocm/169216          -> ciflow/rocm/169216
2025-12-04T09:33:42.0523824Z  * [new tag]                 ciflow/rocm/169312          -> ciflow/rocm/169312
2025-12-04T09:33:42.0524644Z  * [new tag]                 ciflow/rocm/169380          -> ciflow/rocm/169380
2025-12-04T09:33:42.0525604Z  * [new tag]                 ciflow/rocm/169427          -> ciflow/rocm/169427
2025-12-04T09:33:42.0526391Z  * [new tag]                 ciflow/rocm/169455          -> ciflow/rocm/169455
2025-12-04T09:33:42.0527341Z  * [new tag]                 ciflow/rocm/169470          -> ciflow/rocm/169470
2025-12-04T09:33:42.0528122Z  * [new tag]                 ciflow/rocm/169471          -> ciflow/rocm/169471
2025-12-04T09:33:42.0528962Z  * [new tag]                 ciflow/rocm/169472          -> ciflow/rocm/169472
2025-12-04T09:33:42.0529812Z  * [new tag]                 ciflow/rocm/169514          -> ciflow/rocm/169514
2025-12-04T09:33:42.0531172Z  * [new tag]                 ciflow/slow/01c7106         -> ciflow/slow/01c7106
2025-12-04T09:33:42.0532109Z  * [new tag]                 ciflow/slow/0577043         -> ciflow/slow/0577043
2025-12-04T09:33:42.0533704Z  * [new tag]                 ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym -> ciflow/slow/0d5b74da0cab798fbfdb9caa53fad816999c8386-sdym
2025-12-04T09:33:42.0534170Z  * [new tag]                 ciflow/slow/0e81104         -> ciflow/slow/0e81104
2025-12-04T09:33:42.0535024Z  * [new tag]                 ciflow/slow/167207          -> ciflow/slow/167207
2025-12-04T09:33:42.0535803Z  * [new tag]                 ciflow/slow/168050          -> ciflow/slow/168050
2025-12-04T09:33:42.0536903Z  * [new tag]                 ciflow/slow/1732077         -> ciflow/slow/1732077
2025-12-04T09:33:42.0538050Z  * [new tag]                 ciflow/slow/187eb7c         -> ciflow/slow/187eb7c
2025-12-04T09:33:42.0539350Z  * [new tag]                 ciflow/slow/1faef89         -> ciflow/slow/1faef89
2025-12-04T09:33:42.0540624Z  * [new tag]                 ciflow/slow/3920ec1         -> ciflow/slow/3920ec1
2025-12-04T09:33:42.0541801Z  * [new tag]                 ciflow/slow/3b7c6b2         -> ciflow/slow/3b7c6b2
2025-12-04T09:33:42.0542886Z  * [new tag]                 ciflow/slow/59a3759         -> ciflow/slow/59a3759
2025-12-04T09:33:42.0543937Z  * [new tag]                 ciflow/slow/70ef0bb         -> ciflow/slow/70ef0bb
2025-12-04T09:33:42.0545041Z  * [new tag]                 ciflow/slow/788ff06         -> ciflow/slow/788ff06
2025-12-04T09:33:42.0546698Z  * [new tag]                 ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym -> ciflow/slow/8751002215790a3a88750faa8f4366933e296693-sdym
2025-12-04T09:33:42.0547242Z  * [new tag]                 ciflow/slow/9d85864         -> ciflow/slow/9d85864
2025-12-04T09:33:42.0548542Z  * [new tag]                 ciflow/slow/9ffad5b         -> ciflow/slow/9ffad5b
2025-12-04T09:33:42.0549276Z  * [new tag]                 ciflow/slow/a206e8b         -> ciflow/slow/a206e8b
2025-12-04T09:33:42.0550428Z  * [new tag]                 ciflow/slow/a837609         -> ciflow/slow/a837609
2025-12-04T09:33:42.0551497Z  * [new tag]                 ciflow/slow/af841f3         -> ciflow/slow/af841f3
2025-12-04T09:33:42.0553135Z  * [new tag]                 ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym -> ciflow/slow/da3aba1e46157c4df504b067477cdf2b3c96b194-sdym
2025-12-04T09:33:42.0553713Z  * [new tag]                 ciflow/torchbench/168175    -> ciflow/torchbench/168175
2025-12-04T09:33:42.0554787Z  * [new tag]                 ciflow/trunk/148492         -> ciflow/trunk/148492
2025-12-04T09:33:42.0555543Z  * [new tag]                 ciflow/trunk/157149         -> ciflow/trunk/157149
2025-12-04T09:33:42.0556352Z  * [new tag]                 ciflow/trunk/157994         -> ciflow/trunk/157994
2025-12-04T09:33:42.0557167Z  * [new tag]                 ciflow/trunk/159718         -> ciflow/trunk/159718
2025-12-04T09:33:42.0557974Z  * [new tag]                 ciflow/trunk/160685         -> ciflow/trunk/160685
2025-12-04T09:33:42.0558795Z  * [new tag]                 ciflow/trunk/160729         -> ciflow/trunk/160729
2025-12-04T09:33:42.0559636Z  * [new tag]                 ciflow/trunk/162275         -> ciflow/trunk/162275
2025-12-04T09:33:42.0560428Z  * [new tag]                 ciflow/trunk/162795         -> ciflow/trunk/162795
2025-12-04T09:33:42.0561279Z  * [new tag]                 ciflow/trunk/163245         -> ciflow/trunk/163245
2025-12-04T09:33:42.0562090Z  * [new tag]                 ciflow/trunk/163942         -> ciflow/trunk/163942
2025-12-04T09:33:42.0563675Z  * [new tag]                 ciflow/trunk/165274         -> ciflow/trunk/165274
2025-12-04T09:33:42.0564991Z  * [new tag]                 ciflow/trunk/165483         -> ciflow/trunk/165483
2025-12-04T09:33:42.0566213Z  * [new tag]                 ciflow/trunk/165728         -> ciflow/trunk/165728
2025-12-04T09:33:42.0567299Z  * [new tag]                 ciflow/trunk/165922         -> ciflow/trunk/165922
2025-12-04T09:33:42.0568100Z  * [new tag]                 ciflow/trunk/166075         -> ciflow/trunk/166075
2025-12-04T09:33:42.0569055Z  * [new tag]                 ciflow/trunk/166165         -> ciflow/trunk/166165
2025-12-04T09:33:42.0569846Z  * [new tag]                 ciflow/trunk/166829         -> ciflow/trunk/166829
2025-12-04T09:33:42.0570976Z  * [new tag]                 ciflow/trunk/166843         -> ciflow/trunk/166843
2025-12-04T09:33:42.0571763Z  * [new tag]                 ciflow/trunk/166876         -> ciflow/trunk/166876
2025-12-04T09:33:42.0572615Z  * [new tag]                 ciflow/trunk/167207         -> ciflow/trunk/167207
2025-12-04T09:33:42.0573584Z  * [new tag]                 ciflow/trunk/167536         -> ciflow/trunk/167536
2025-12-04T09:33:42.0574583Z  * [new tag]                 ciflow/trunk/167552         -> ciflow/trunk/167552
2025-12-04T09:33:42.0575516Z  * [new tag]                 ciflow/trunk/167555         -> ciflow/trunk/167555
2025-12-04T09:33:42.0576350Z  * [new tag]                 ciflow/trunk/167599         -> ciflow/trunk/167599
2025-12-04T09:33:42.0577317Z  * [new tag]                 ciflow/trunk/167659         -> ciflow/trunk/167659
2025-12-04T09:33:42.0578340Z  * [new tag]                 ciflow/trunk/167672         -> ciflow/trunk/167672
2025-12-04T09:33:42.0579123Z  * [new tag]                 ciflow/trunk/167742         -> ciflow/trunk/167742
2025-12-04T09:33:42.0580134Z  * [new tag]                 ciflow/trunk/167781         -> ciflow/trunk/167781
2025-12-04T09:33:42.0581197Z  * [new tag]                 ciflow/trunk/167837         -> ciflow/trunk/167837
2025-12-04T09:33:42.0582007Z  * [new tag]                 ciflow/trunk/167887         -> ciflow/trunk/167887
2025-12-04T09:33:42.0582992Z  * [new tag]                 ciflow/trunk/167978         -> ciflow/trunk/167978
2025-12-04T09:33:42.0583820Z  * [new tag]                 ciflow/trunk/168050         -> ciflow/trunk/168050
2025-12-04T09:33:42.0584626Z  * [new tag]                 ciflow/trunk/168051         -> ciflow/trunk/168051
2025-12-04T09:33:42.0585632Z  * [new tag]                 ciflow/trunk/168096         -> ciflow/trunk/168096
2025-12-04T09:33:42.0586378Z  * [new tag]                 ciflow/trunk/168127         -> ciflow/trunk/168127
2025-12-04T09:33:42.0587234Z  * [new tag]                 ciflow/trunk/168157         -> ciflow/trunk/168157
2025-12-04T09:33:42.0588090Z  * [new tag]                 ciflow/trunk/168175         -> ciflow/trunk/168175
2025-12-04T09:33:42.0588964Z  * [new tag]                 ciflow/trunk/168209         -> ciflow/trunk/168209
2025-12-04T09:33:42.0590079Z  * [new tag]                 ciflow/trunk/168213         -> ciflow/trunk/168213
2025-12-04T09:33:42.0591092Z  * [new tag]                 ciflow/trunk/168226         -> ciflow/trunk/168226
2025-12-04T09:33:42.0591884Z  * [new tag]                 ciflow/trunk/168262         -> ciflow/trunk/168262
2025-12-04T09:33:42.0592746Z  * [new tag]                 ciflow/trunk/168275         -> ciflow/trunk/168275
2025-12-04T09:33:42.0593864Z  * [new tag]                 ciflow/trunk/168328         -> ciflow/trunk/168328
2025-12-04T09:33:42.0594652Z  * [new tag]                 ciflow/trunk/168368         -> ciflow/trunk/168368
2025-12-04T09:33:42.0595637Z  * [new tag]                 ciflow/trunk/168917         -> ciflow/trunk/168917
2025-12-04T09:33:42.0596393Z  * [new tag]                 ciflow/trunk/168933         -> ciflow/trunk/168933
2025-12-04T09:33:42.0597514Z  * [new tag]                 ciflow/trunk/168941         -> ciflow/trunk/168941
2025-12-04T09:33:42.0598271Z  * [new tag]                 ciflow/trunk/168955         -> ciflow/trunk/168955
2025-12-04T09:33:42.0599223Z  * [new tag]                 ciflow/trunk/168980         -> ciflow/trunk/168980
2025-12-04T09:33:42.0600320Z  * [new tag]                 ciflow/trunk/169004         -> ciflow/trunk/169004
2025-12-04T09:33:42.0601217Z  * [new tag]                 ciflow/trunk/169006         -> ciflow/trunk/169006
2025-12-04T09:33:42.0602292Z  * [new tag]                 ciflow/trunk/169023         -> ciflow/trunk/169023
2025-12-04T09:33:42.0603251Z  * [new tag]                 ciflow/trunk/169025         -> ciflow/trunk/169025
2025-12-04T09:33:42.0604232Z  * [new tag]                 ciflow/trunk/169048         -> ciflow/trunk/169048
2025-12-04T09:33:42.0605009Z  * [new tag]                 ciflow/trunk/169066         -> ciflow/trunk/169066
2025-12-04T09:33:42.0605866Z  * [new tag]                 ciflow/trunk/169091         -> ciflow/trunk/169091
2025-12-04T09:33:42.0606800Z  * [new tag]                 ciflow/trunk/169102         -> ciflow/trunk/169102
2025-12-04T09:33:42.0607626Z  * [new tag]                 ciflow/trunk/169103         -> ciflow/trunk/169103
2025-12-04T09:33:42.0608761Z  * [new tag]                 ciflow/trunk/169125         -> ciflow/trunk/169125
2025-12-04T09:33:42.0609753Z  * [new tag]                 ciflow/trunk/169139         -> ciflow/trunk/169139
2025-12-04T09:33:42.0610806Z  * [new tag]                 ciflow/trunk/169148         -> ciflow/trunk/169148
2025-12-04T09:33:42.0611621Z  * [new tag]                 ciflow/trunk/169151         -> ciflow/trunk/169151
2025-12-04T09:33:42.0612621Z  * [new tag]                 ciflow/trunk/169156         -> ciflow/trunk/169156
2025-12-04T09:33:42.0613634Z  * [new tag]                 ciflow/trunk/169176         -> ciflow/trunk/169176
2025-12-04T09:33:42.0614420Z  * [new tag]                 ciflow/trunk/169204         -> ciflow/trunk/169204
2025-12-04T09:33:42.0615409Z  * [new tag]                 ciflow/trunk/169207         -> ciflow/trunk/169207
2025-12-04T09:33:42.0616172Z  * [new tag]                 ciflow/trunk/169211         -> ciflow/trunk/169211
2025-12-04T09:33:42.0617378Z  * [new tag]                 ciflow/trunk/169231         -> ciflow/trunk/169231
2025-12-04T09:33:42.0618342Z  * [new tag]                 ciflow/trunk/169260         -> ciflow/trunk/169260
2025-12-04T09:33:42.0619621Z  * [new tag]                 ciflow/trunk/169271         -> ciflow/trunk/169271
2025-12-04T09:33:42.0620393Z  * [new tag]                 ciflow/trunk/169280         -> ciflow/trunk/169280
2025-12-04T09:33:42.0621260Z  * [new tag]                 ciflow/trunk/169281         -> ciflow/trunk/169281
2025-12-04T09:33:42.0622221Z  * [new tag]                 ciflow/trunk/169286         -> ciflow/trunk/169286
2025-12-04T09:33:42.0623297Z  * [new tag]                 ciflow/trunk/169293         -> ciflow/trunk/169293
2025-12-04T09:33:42.0624665Z  * [new tag]                 ciflow/trunk/169296         -> ciflow/trunk/169296
2025-12-04T09:33:42.0625479Z  * [new tag]                 ciflow/trunk/169304         -> ciflow/trunk/169304
2025-12-04T09:33:42.0626428Z  * [new tag]                 ciflow/trunk/169305         -> ciflow/trunk/169305
2025-12-04T09:33:42.0627258Z  * [new tag]                 ciflow/trunk/169312         -> ciflow/trunk/169312
2025-12-04T09:33:42.0628508Z  * [new tag]                 ciflow/trunk/169328         -> ciflow/trunk/169328
2025-12-04T09:33:42.0629318Z  * [new tag]                 ciflow/trunk/169343         -> ciflow/trunk/169343
2025-12-04T09:33:42.0630312Z  * [new tag]                 ciflow/trunk/169355         -> ciflow/trunk/169355
2025-12-04T09:33:42.0631139Z  * [new tag]                 ciflow/trunk/169370         -> ciflow/trunk/169370
2025-12-04T09:33:42.0632252Z  * [new tag]                 ciflow/trunk/169379         -> ciflow/trunk/169379
2025-12-04T09:33:42.0633032Z  * [new tag]                 ciflow/trunk/169380         -> ciflow/trunk/169380
2025-12-04T09:33:42.0634014Z  * [new tag]                 ciflow/trunk/169385         -> ciflow/trunk/169385
2025-12-04T09:33:42.0635010Z  * [new tag]                 ciflow/trunk/169387         -> ciflow/trunk/169387
2025-12-04T09:33:42.0636018Z  * [new tag]                 ciflow/trunk/169410         -> ciflow/trunk/169410
2025-12-04T09:33:42.0636830Z  * [new tag]                 ciflow/trunk/169412         -> ciflow/trunk/169412
2025-12-04T09:33:42.0637688Z  * [new tag]                 ciflow/trunk/169418         -> ciflow/trunk/169418
2025-12-04T09:33:42.0638680Z  * [new tag]                 ciflow/trunk/169423         -> ciflow/trunk/169423
2025-12-04T09:33:42.0639451Z  * [new tag]                 ciflow/trunk/169427         -> ciflow/trunk/169427
2025-12-04T09:33:42.0640495Z  * [new tag]                 ciflow/trunk/169430         -> ciflow/trunk/169430
2025-12-04T09:33:42.0641228Z  * [new tag]                 ciflow/trunk/169437         -> ciflow/trunk/169437
2025-12-04T09:33:42.0642151Z  * [new tag]                 ciflow/trunk/169442         -> ciflow/trunk/169442
2025-12-04T09:33:42.0643221Z  * [new tag]                 ciflow/trunk/169452         -> ciflow/trunk/169452
2025-12-04T09:33:42.0643974Z  * [new tag]                 ciflow/trunk/169454         -> ciflow/trunk/169454
2025-12-04T09:33:42.0644959Z  * [new tag]                 ciflow/trunk/169459         -> ciflow/trunk/169459
2025-12-04T09:33:42.0646068Z  * [new tag]                 ciflow/trunk/169474         -> ciflow/trunk/169474
2025-12-04T09:33:42.0646862Z  * [new tag]                 ciflow/trunk/169475         -> ciflow/trunk/169475
2025-12-04T09:33:42.0647860Z  * [new tag]                 ciflow/trunk/169476         -> ciflow/trunk/169476
2025-12-04T09:33:42.0648880Z  * [new tag]                 ciflow/trunk/169487         -> ciflow/trunk/169487
2025-12-04T09:33:42.0649679Z  * [new tag]                 ciflow/trunk/169497         -> ciflow/trunk/169497
2025-12-04T09:33:42.0650540Z  * [new tag]                 ciflow/trunk/169503         -> ciflow/trunk/169503
2025-12-04T09:33:42.0651940Z  * [new tag]                 ciflow/trunk/169505         -> ciflow/trunk/169505
2025-12-04T09:33:42.0652293Z  * [new tag]                 ciflow/trunk/169507         -> ciflow/trunk/169507
2025-12-04T09:33:42.0653302Z  * [new tag]                 ciflow/trunk/169514         -> ciflow/trunk/169514
2025-12-04T09:33:42.0654134Z  * [new tag]                 ciflow/trunk/169517         -> ciflow/trunk/169517
2025-12-04T09:33:42.0654893Z  * [new tag]                 ciflow/trunk/169519         -> ciflow/trunk/169519
2025-12-04T09:33:42.0655887Z  * [new tag]                 ciflow/trunk/169528         -> ciflow/trunk/169528
2025-12-04T09:33:42.0656665Z  * [new tag]                 ciflow/trunk/169541         -> ciflow/trunk/169541
2025-12-04T09:33:42.0657800Z  * [new tag]                 ciflow/trunk/169555         -> ciflow/trunk/169555
2025-12-04T09:33:42.0659116Z  * [new tag]                 ciflow/unstable/123         -> ciflow/unstable/123
2025-12-04T09:33:42.0660619Z  * [new tag]                 ciflow/vllm/165270          -> ciflow/vllm/165270
2025-12-04T09:33:42.0661359Z  * [new tag]                 ciflow/vllm/165274          -> ciflow/vllm/165274
2025-12-04T09:33:42.0662150Z  * [new tag]                 ciflow/vllm/166494          -> ciflow/vllm/166494
2025-12-04T09:33:42.0663191Z  * [new tag]                 ciflow/vllm/169219          -> ciflow/vllm/169219
2025-12-04T09:33:42.0663891Z  * [new tag]                 ciflow/vllm/169220          -> ciflow/vllm/169220
2025-12-04T09:33:42.0664995Z  * [new tag]                 ciflow/xpu/157994           -> ciflow/xpu/157994
2025-12-04T09:33:42.0665678Z  * [new tag]                 ciflow/xpu/159718           -> ciflow/xpu/159718
2025-12-04T09:33:42.0666506Z  * [new tag]                 ciflow/xpu/161940           -> ciflow/xpu/161940
2025-12-04T09:33:42.0667580Z  * [new tag]                 ciflow/xpu/163251           -> ciflow/xpu/163251
2025-12-04T09:33:42.0668254Z  * [new tag]                 ciflow/xpu/166829           -> ciflow/xpu/166829
2025-12-04T09:33:42.0669247Z  * [new tag]                 ciflow/xpu/166843           -> ciflow/xpu/166843
2025-12-04T09:33:42.0670059Z  * [new tag]                 ciflow/xpu/167972           -> ciflow/xpu/167972
2025-12-04T09:33:42.0670735Z  * [new tag]                 ciflow/xpu/167981           -> ciflow/xpu/167981
2025-12-04T09:33:42.0671566Z  * [new tag]                 ciflow/xpu/168213           -> ciflow/xpu/168213
2025-12-04T09:33:42.0672382Z  * [new tag]                 ciflow/xpu/168262           -> ciflow/xpu/168262
2025-12-04T09:33:42.0673182Z  * [new tag]                 ciflow/xpu/168328           -> ciflow/xpu/168328
2025-12-04T09:33:42.0674415Z  * [new tag]                 ciflow/xpu/168950           -> ciflow/xpu/168950
2025-12-04T09:33:42.0675697Z  * [new tag]                 ciflow/xpu/169039           -> ciflow/xpu/169039
2025-12-04T09:33:42.0676768Z  * [new tag]                 ciflow/xpu/169200           -> ciflow/xpu/169200
2025-12-04T09:33:42.0677545Z  * [new tag]                 ciflow/xpu/169203           -> ciflow/xpu/169203
2025-12-04T09:33:42.0678572Z  * [new tag]                 ciflow/xpu/169230           -> ciflow/xpu/169230
2025-12-04T09:33:42.0679286Z  * [new tag]                 ciflow/xpu/169231           -> ciflow/xpu/169231
2025-12-04T09:33:42.0680393Z  * [new tag]                 ciflow/xpu/169241           -> ciflow/xpu/169241
2025-12-04T09:33:42.0681161Z  * [new tag]                 ciflow/xpu/169280           -> ciflow/xpu/169280
2025-12-04T09:33:42.0682122Z  * [new tag]                 ciflow/xpu/169296           -> ciflow/xpu/169296
2025-12-04T09:33:42.0683325Z  * [new tag]                 ciflow/xpu/169353           -> ciflow/xpu/169353
2025-12-04T09:33:42.0684023Z  * [new tag]                 ciflow/xpu/169410           -> ciflow/xpu/169410
2025-12-04T09:33:42.0684881Z  * [new tag]                 ciflow/xpu/169442           -> ciflow/xpu/169442
2025-12-04T09:33:42.0685986Z  * [new tag]                 ciflow/xpu/169555           -> ciflow/xpu/169555
2025-12-04T09:33:42.0686947Z  * [new tag]                 cslpull75                   -> cslpull75
2025-12-04T09:33:42.0687773Z  * [new tag]                 cslpull76                   -> cslpull76
2025-12-04T09:33:42.0688797Z  * [new tag]                 cslpull77                   -> cslpull77
2025-12-04T09:33:42.0689861Z  * [new tag]                 cslpull78                   -> cslpull78
2025-12-04T09:33:42.0690947Z  * [new tag]                 cslpull79                   -> cslpull79
2025-12-04T09:33:42.0692302Z  * [new tag]                 cslpull80                   -> cslpull80
2025-12-04T09:33:42.0693331Z  * [new tag]                 cslpull81                   -> cslpull81
2025-12-04T09:33:42.0694322Z  * [new tag]                 cslpull82                   -> cslpull82
2025-12-04T09:33:42.0695292Z  * [new tag]                 cslpull83                   -> cslpull83
2025-12-04T09:33:42.0696266Z  * [new tag]                 cslpull84                   -> cslpull84
2025-12-04T09:33:42.0697074Z  * [new tag]                 cslpull85                   -> cslpull85
2025-12-04T09:33:42.0698270Z  * [new tag]                 cslpull86                   -> cslpull86
2025-12-04T09:33:42.0699260Z  * [new tag]                 cslpull87                   -> cslpull87
2025-12-04T09:33:42.0700269Z  * [new tag]                 cslpull88                   -> cslpull88
2025-12-04T09:33:42.0701143Z  * [new tag]                 cslpull89                   -> cslpull89
2025-12-04T09:33:42.0702198Z  * [new tag]                 cslpull90                   -> cslpull90
2025-12-04T09:33:42.0703588Z  * [new tag]                 cslpull91                   -> cslpull91
2025-12-04T09:33:42.0704495Z  * [new tag]                 cslpull92                   -> cslpull92
2025-12-04T09:33:42.0705636Z  * [new tag]                 flight_5                    -> flight_5
2025-12-04T09:33:42.0706799Z  * [new tag]                 flight_5.1                  -> flight_5.1
2025-12-04T09:33:42.0707785Z  * [new tag]                 flight_5.2                  -> flight_5.2
2025-12-04T09:33:42.0708865Z  * [new tag]                 flight_5.3                  -> flight_5.3
2025-12-04T09:33:42.0709912Z  * [new tag]                 forpull1                    -> forpull1
2025-12-04T09:33:42.0711148Z  * [new tag]                 malfet/tag-2ef5611          -> malfet/tag-2ef5611
2025-12-04T09:33:42.0712116Z  * [new tag]                 malfet/tag-317b1a0          -> malfet/tag-317b1a0
2025-12-04T09:33:42.0713137Z  * [new tag]                 malfet/tag-ec6f767          -> malfet/tag-ec6f767
2025-12-04T09:33:42.0714178Z  * [new tag]                 nightly-binary              -> nightly-binary
2025-12-04T09:33:42.0715235Z  * [new tag]                 sqzhang_flight4_plus        -> sqzhang_flight4_plus
2025-12-04T09:33:42.0716378Z  * [new tag]                 sqzhang_flight_3            -> sqzhang_flight_3
2025-12-04T09:33:42.0717752Z  * [new tag]                 trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272 -> trunk/02d8bd6974cf84b721680d773dbdb1b6f40ce272
2025-12-04T09:33:42.0718805Z  * [new tag]                 trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e -> trunk/066997fb38ade71e00d78e9d572e380b5f02bd3e
2025-12-04T09:33:42.0720234Z  * [new tag]                 trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88 -> trunk/076e7b19fa1d481ad778d06d2b49ba57d3ce8c88
2025-12-04T09:33:42.0721651Z  * [new tag]                 trunk/07dcc0b83db3211653a38565a24e15acdba75654 -> trunk/07dcc0b83db3211653a38565a24e15acdba75654
2025-12-04T09:33:42.0722690Z  * [new tag]                 trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb -> trunk/082e96b68dfcd16cab7cfafc4d3d055767dab3eb
2025-12-04T09:33:42.0723969Z  * [new tag]                 trunk/088048f2fea28ff7d450f65c72419ca45780d30b -> trunk/088048f2fea28ff7d450f65c72419ca45780d30b
2025-12-04T09:33:42.0724948Z  * [new tag]                 trunk/09076941a95c76f4d9ad189d064dfd8baa39e672 -> trunk/09076941a95c76f4d9ad189d064dfd8baa39e672
2025-12-04T09:33:42.0725930Z  * [new tag]                 trunk/0b80a4c62b94402844bf221791c096b0035c6d75 -> trunk/0b80a4c62b94402844bf221791c096b0035c6d75
2025-12-04T09:33:42.0727265Z  * [new tag]                 trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2 -> trunk/0bbbdf1750567a980634ad907a325357ba8ba8f2
2025-12-04T09:33:42.0728334Z  * [new tag]                 trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5 -> trunk/0c281dd78773b2bc17c58ead0e4cd4ac46e775c5
2025-12-04T09:33:42.0729290Z  * [new tag]                 trunk/135f3753c418a6879b1954904184937b67e61688 -> trunk/135f3753c418a6879b1954904184937b67e61688
2025-12-04T09:33:42.0730348Z  * [new tag]                 trunk/15da21026cb13cd20257dc9e96830db108743c10 -> trunk/15da21026cb13cd20257dc9e96830db108743c10
2025-12-04T09:33:42.0731449Z  * [new tag]                 trunk/166efdad2ac827f30fb02504c6017520257f88ec -> trunk/166efdad2ac827f30fb02504c6017520257f88ec
2025-12-04T09:33:42.0732459Z  * [new tag]                 trunk/174272c15fae553d8488140af931f7d8050a313f -> trunk/174272c15fae553d8488140af931f7d8050a313f
2025-12-04T09:33:42.0733776Z  * [new tag]                 trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11 -> trunk/18f3ca08f13b8de61307f5e8cd7d4cccb67e9d11
2025-12-04T09:33:42.0735330Z  * [new tag]                 trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63 -> trunk/1902eddfe655a15ebcf2c72bd81ade110fdeef63
2025-12-04T09:33:42.0736311Z  * [new tag]                 trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5 -> trunk/195f92e98d3d66738577f11f22c4b5c8a1c76dd5
2025-12-04T09:33:42.0737350Z  * [new tag]                 trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676 -> trunk/1aa13e17de39e3c768ea7aebaad166ce72a06676
2025-12-04T09:33:42.0738396Z  * [new tag]                 trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e -> trunk/1afe2832f58e24e54a5bfda5a5afa9b96fdea40e
2025-12-04T09:33:42.0739391Z  * [new tag]                 trunk/1c87554d74140eaee964ca8b1832cede67f5f520 -> trunk/1c87554d74140eaee964ca8b1832cede67f5f520
2025-12-04T09:33:42.0740462Z  * [new tag]                 trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8 -> trunk/1ccb743b7b5be955f49736c162c4f5004b8a0dd8
2025-12-04T09:33:42.0741684Z  * [new tag]                 trunk/1cee47d6ce0a02227185b566593f002dd639ca0c -> trunk/1cee47d6ce0a02227185b566593f002dd639ca0c
2025-12-04T09:33:42.0744381Z  * [new tag]                 trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d -> trunk/1d21b4df2babe322e5d085ceb6de884eb260a62d
2025-12-04T09:33:42.0744872Z  * [new tag]                 trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8 -> trunk/1e34fb2550e4aa650314f7a6d9f6daf4da7478a8
2025-12-04T09:33:42.0745335Z  * [new tag]                 trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de -> trunk/1e526fb5b1d93bfc70691c5c3955fdffc1b7b7de
2025-12-04T09:33:42.0745821Z  * [new tag]                 trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543 -> trunk/1ee32a8b1f554a312d79bad01ded24f38cd95543
2025-12-04T09:33:42.0746673Z  * [new tag]                 trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7 -> trunk/201e2c4117eb9744594dad6a5c18213d7b4705d7
2025-12-04T09:33:42.0747683Z  * [new tag]                 trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f -> trunk/2353a0f60eb4b4cb6675907a7fa9fbedc1c02e7f
2025-12-04T09:33:42.0748931Z  * [new tag]                 trunk/285779b1621cf9f073a062b0889a642d200308d9 -> trunk/285779b1621cf9f073a062b0889a642d200308d9
2025-12-04T09:33:42.0749749Z  * [new tag]                 trunk/2887faaec6295d081580d09fce161201826c6d87 -> trunk/2887faaec6295d081580d09fce161201826c6d87
2025-12-04T09:33:42.0750787Z  * [new tag]                 trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc -> trunk/296e67c92635443c67b11c0ae1bd045f03ebb7bc
2025-12-04T09:33:42.0751834Z  * [new tag]                 trunk/29856679769b3dede478767e2fe6cfb51197cb25 -> trunk/29856679769b3dede478767e2fe6cfb51197cb25
2025-12-04T09:33:42.0752922Z  * [new tag]                 trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563 -> trunk/29e5455a4740c326ab187c7aa7b5ef98034ea563
2025-12-04T09:33:42.0753941Z  * [new tag]                 trunk/2ac3ef882afb23136adc188975f0a8802fc68adf -> trunk/2ac3ef882afb23136adc188975f0a8802fc68adf
2025-12-04T09:33:42.0754811Z  * [new tag]                 trunk/2bec68e73b64715354af076ad309335f943e36cd -> trunk/2bec68e73b64715354af076ad309335f943e36cd
2025-12-04T09:33:42.0755804Z  * [new tag]                 trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1 -> trunk/2c87367e6f88662cd5cedbd1537748b7948c38e1
2025-12-04T09:33:42.0756911Z  * [new tag]                 trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708 -> trunk/2d1f78fe3ec13820f136a2e0336da12a25f41708
2025-12-04T09:33:42.0758202Z  * [new tag]                 trunk/2df6058f116a65722a0e03073402feb242572d35 -> trunk/2df6058f116a65722a0e03073402feb242572d35
2025-12-04T09:33:42.0759172Z  * [new tag]                 trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec -> trunk/2e0c2e170fe658c440775c8e5c44228aafcc47ec
2025-12-04T09:33:42.0760410Z  * [new tag]                 trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94 -> trunk/2f9b7dad7b5419b063bd0f2e204de192720ebb94
2025-12-04T09:33:42.0761334Z  * [new tag]                 trunk/305168768a95d69c444df5cd334bb774edfe06f1 -> trunk/305168768a95d69c444df5cd334bb774edfe06f1
2025-12-04T09:33:42.0762508Z  * [new tag]                 trunk/31fc12773026e8e00f054dd79ad9b2491e693b48 -> trunk/31fc12773026e8e00f054dd79ad9b2491e693b48
2025-12-04T09:33:42.0763602Z  * [new tag]                 trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991 -> trunk/320de0c6b0a3e7c6d2693ea5c28d5d0156ba7991
2025-12-04T09:33:42.0765542Z  * [new tag]                 trunk/3418bd29475dff06695045fcdf93e7d0dac67da8 -> trunk/3418bd29475dff06695045fcdf93e7d0dac67da8
2025-12-04T09:33:42.0766109Z  * [new tag]                 trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf -> trunk/34a98608afa0cb5b48f0d6d30432fdd0a2614ddf
2025-12-04T09:33:42.0766759Z  * [new tag]                 trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee -> trunk/35b7a9a26c5923d98aebaa41a031dae21788a9ee
2025-12-04T09:33:42.0767781Z  * [new tag]                 trunk/39d07dbf03a911bdd45d1af78d8638dc92074938 -> trunk/39d07dbf03a911bdd45d1af78d8638dc92074938
2025-12-04T09:33:42.0768597Z  * [new tag]                 trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725 -> trunk/3cd98b4205ada151042cc7ff097a82d4a4b18725
2025-12-04T09:33:42.0769663Z  * [new tag]                 trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae -> trunk/3d35fd20a78ff4d016fa80f4e5fad37191d7bcae
2025-12-04T09:33:42.0770684Z  * [new tag]                 trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f -> trunk/409a5fee945c46a3edaf5df162812f201bfd7b2f
2025-12-04T09:33:42.0771690Z  * [new tag]                 trunk/42e9005cda22da3f1c559c3649218cebd671027c -> trunk/42e9005cda22da3f1c559c3649218cebd671027c
2025-12-04T09:33:42.0772766Z  * [new tag]                 trunk/43b94713bbf340d3c124fde02d0f73add4021247 -> trunk/43b94713bbf340d3c124fde02d0f73add4021247
2025-12-04T09:33:42.0773794Z  * [new tag]                 trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c -> trunk/44ac69388a4a5eb463dbd2a13f00d1e3b924566c
2025-12-04T09:33:42.0774844Z  * [new tag]                 trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a -> trunk/45d14e2497292be06ad36eaa1aaaf7c630a2586a
2025-12-04T09:33:42.0775772Z  * [new tag]                 trunk/45d310ad84854dff730c0b12e577d7998d978686 -> trunk/45d310ad84854dff730c0b12e577d7998d978686
2025-12-04T09:33:42.0777173Z  * [new tag]                 trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54 -> trunk/47b28ddf7bd74b50fa93b307a7d3b183a6d77f54
2025-12-04T09:33:42.0778008Z  * [new tag]                 trunk/481e5ab336275bd3acd5fa8a611b05b4469012af -> trunk/481e5ab336275bd3acd5fa8a611b05b4469012af
2025-12-04T09:33:42.0779107Z  * [new tag]                 trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96 -> trunk/491731647f6b8a9345dcfb3bc9416aea254a7d96
2025-12-04T09:33:42.0780231Z  * [new tag]                 trunk/49a04d26088acc17d948ddd66920f3e16371e873 -> trunk/49a04d26088acc17d948ddd66920f3e16371e873
2025-12-04T09:33:42.0781259Z  * [new tag]                 trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985 -> trunk/4bebc827c47d2f1f0fa1a417a5201a97aef3d985
2025-12-04T09:33:42.0782135Z  * [new tag]                 trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f -> trunk/4c246677784c6a14bc2dbb9ff8773ef0a3a3222f
2025-12-04T09:33:42.0783291Z  * [new tag]                 trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa -> trunk/4cfb47ff548b6d996641058cf04a70e311a4c3aa
2025-12-04T09:33:42.0784421Z  * [new tag]                 trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c -> trunk/4e0061c1aa52f606dda8cfab0bd7591e588faf2c
2025-12-04T09:33:42.0786053Z  * [new tag]                 trunk/4fefb8e7e942386ffac764a41b232241f82bea3a -> trunk/4fefb8e7e942386ffac764a41b232241f82bea3a
2025-12-04T09:33:42.0787032Z  * [new tag]                 trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d -> trunk/503b2640023521f5a35cd9a52fc8033d73a95d0d
2025-12-04T09:33:42.0788063Z  * [new tag]                 trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9 -> trunk/518c2b1b3dab9a2ef2849e04b3bc2f20c1c41db9
2025-12-04T09:33:42.0789091Z  * [new tag]                 trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3 -> trunk/5191b2fa68ba19960912bfd7fd721c79d76bb1f3
2025-12-04T09:33:42.0790281Z  * [new tag]                 trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a -> trunk/52ac0f0dc4acacd219f1317fbc28ec631c01e07a
2025-12-04T09:33:42.0791361Z  * [new tag]                 trunk/539ba711b029de9f191070f4f0d12f18f5b7f292 -> trunk/539ba711b029de9f191070f4f0d12f18f5b7f292
2025-12-04T09:33:42.0792387Z  * [new tag]                 trunk/556375b55deebebbc56cb7aef81f4d52f031ba28 -> trunk/556375b55deebebbc56cb7aef81f4d52f031ba28
2025-12-04T09:33:42.0793574Z  * [new tag]                 trunk/55c4ab554845481d0a69a3811937575fe8bb1a66 -> trunk/55c4ab554845481d0a69a3811937575fe8bb1a66
2025-12-04T09:33:42.0794571Z  * [new tag]                 trunk/5634469fda9e5d98869c82c7d03bb08914245f96 -> trunk/5634469fda9e5d98869c82c7d03bb08914245f96
2025-12-04T09:33:42.0795396Z  * [new tag]                 trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc -> trunk/5778f6ff894686a975a9a23645178ae4c87ad5dc
2025-12-04T09:33:42.0796468Z  * [new tag]                 trunk/587d63a3e07de5dc91065f9ef70bcacda9989068 -> trunk/587d63a3e07de5dc91065f9ef70bcacda9989068
2025-12-04T09:33:42.0797565Z  * [new tag]                 trunk/597930f6b568852356ca9795dac76f9e4653adbd -> trunk/597930f6b568852356ca9795dac76f9e4653adbd
2025-12-04T09:33:42.0798497Z  * [new tag]                 trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6 -> trunk/597df3a4e2a67b9fdbe1a89b2f4d74f822274db6
2025-12-04T09:33:42.0799597Z  * [new tag]                 trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883 -> trunk/59abd50e931f4efb21b053f7a2911f5d8a49d883
2025-12-04T09:33:42.0800658Z  * [new tag]                 trunk/5a607febc04c3a2b5824c75f3f60307867439a2c -> trunk/5a607febc04c3a2b5824c75f3f60307867439a2c
2025-12-04T09:33:42.0804703Z  * [new tag]                 trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b -> trunk/5bf1cdf4755c54ef462b44cb8041b0a57311556b
2025-12-04T09:33:42.0806168Z  * [new tag]                 trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c -> trunk/5f0030ba63d334d7e8c93a09e41403b89e4c573c
2025-12-04T09:33:42.0807054Z  * [new tag]                 trunk/5f21d27e71268464d362a96c9ac09ea475f7f202 -> trunk/5f21d27e71268464d362a96c9ac09ea475f7f202
2025-12-04T09:33:42.0808164Z  * [new tag]                 trunk/5fafc13038c9988d9ac21fa793fbd5890604b447 -> trunk/5fafc13038c9988d9ac21fa793fbd5890604b447
2025-12-04T09:33:42.0809301Z  * [new tag]                 trunk/61be54a31dc09b59d99b62176fb935aee0b924ef -> trunk/61be54a31dc09b59d99b62176fb935aee0b924ef
2025-12-04T09:33:42.0810339Z  * [new tag]                 trunk/62d3ccd71484ed6a760d909b41487101bbc65719 -> trunk/62d3ccd71484ed6a760d909b41487101bbc65719
2025-12-04T09:33:42.0811397Z  * [new tag]                 trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b -> trunk/641cdb68ae27668eb441d0e49c87a0602c120c2b
2025-12-04T09:33:42.0812397Z  * [new tag]                 trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a -> trunk/65c4620d6bb0c6029f69762c22b91dda2294da9a
2025-12-04T09:33:42.0813508Z  * [new tag]                 trunk/66004b993744b4106bf8afaba71f3c228a804206 -> trunk/66004b993744b4106bf8afaba71f3c228a804206
2025-12-04T09:33:42.0814528Z  * [new tag]                 trunk/6658a04c7ca67acb64512341342e7b3ee13ee386 -> trunk/6658a04c7ca67acb64512341342e7b3ee13ee386
2025-12-04T09:33:42.0815557Z  * [new tag]                 trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4 -> trunk/6864e309092a71f8ab0ca6a4dc7f8a4073fd31c4
2025-12-04T09:33:42.0816774Z  * [new tag]                 trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d -> trunk/6c261c6cb07892c90ca19ed51c9705b1659a3f7d
2025-12-04T09:33:42.0817729Z  * [new tag]                 trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b -> trunk/6c8b6a043f1628188b6396b3a2a6e000ca68362b
2025-12-04T09:33:42.0818710Z  * [new tag]                 trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5 -> trunk/6ceb4a32f92ae67ce5d7d97931d17401ebf5ffa5
2025-12-04T09:33:42.0819712Z  * [new tag]                 trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8 -> trunk/6e404e9b7d6f5fb0de86aa73888c3038248c17f8
2025-12-04T09:33:42.0820829Z  * [new tag]                 trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec -> trunk/6ec30b490aee1db6bcdc7340abddef25784f08ec
2025-12-04T09:33:42.0821938Z  * [new tag]                 trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71 -> trunk/6f2783a6c08e1db34275ff25176ffe9aebc30a71
2025-12-04T09:33:42.0822945Z  * [new tag]                 trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d -> trunk/6f53fefeb90ad3281119b5cfc4aa9ffd8a066e3d
2025-12-04T09:33:42.0823984Z  * [new tag]                 trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a -> trunk/6f7dcf51e46d0c880db1a2f5c70de57adb576f4a
2025-12-04T09:33:42.0825096Z  * [new tag]                 trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e -> trunk/6ff831180d2fa436c7f1c1af3adac641fce9d60e
2025-12-04T09:33:42.0826216Z  * [new tag]                 trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8 -> trunk/70076464a63ab218a7ceefb0e76ccd7131deb8f8
2025-12-04T09:33:42.0827137Z  * [new tag]                 trunk/70d797a5fc109b20a517646fcaa819477cd0d485 -> trunk/70d797a5fc109b20a517646fcaa819477cd0d485
2025-12-04T09:33:42.0828130Z  * [new tag]                 trunk/7348cb355ff0a6f79cd4871215aea72185748734 -> trunk/7348cb355ff0a6f79cd4871215aea72185748734
2025-12-04T09:33:42.0829216Z  * [new tag]                 trunk/74fe26a1ebe32931783569f2e762e3c2c974901f -> trunk/74fe26a1ebe32931783569f2e762e3c2c974901f
2025-12-04T09:33:42.0830426Z  * [new tag]                 trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696 -> trunk/76aeb8c7e0f795b3fddca134cbea9a69da3ee696
2025-12-04T09:33:42.0831279Z  * [new tag]                 trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f -> trunk/7716da9fb23f27a65b41f9f016a2afadf281c18f
2025-12-04T09:33:42.0832361Z  * [new tag]                 trunk/7741edd4ed665f3988052e260863efb508d61a03 -> trunk/7741edd4ed665f3988052e260863efb508d61a03
2025-12-04T09:33:42.0833472Z  * [new tag]                 trunk/78adb3b3df41b45d2368b67226d2f864b78939a6 -> trunk/78adb3b3df41b45d2368b67226d2f864b78939a6
2025-12-04T09:33:42.0834559Z  * [new tag]                 trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7 -> trunk/79d7b178225e5ed24d4e1db74e5abbff848f5fb7
2025-12-04T09:33:42.0835399Z  * [new tag]                 trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3 -> trunk/7a1e316115fc6996b3f2336822ba5d5f6179f0c3
2025-12-04T09:33:42.0836422Z  * [new tag]                 trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca -> trunk/7a41b66367c38d0af3e8a90f7be48d6b281e7bca
2025-12-04T09:33:42.0837450Z  * [new tag]                 trunk/7b7af390ea8541c611d1ce2018a6934188fc197b -> trunk/7b7af390ea8541c611d1ce2018a6934188fc197b
2025-12-04T09:33:42.0838463Z  * [new tag]                 trunk/7ba4680f3755a560af81aa0f688791e367aa3609 -> trunk/7ba4680f3755a560af81aa0f688791e367aa3609
2025-12-04T09:33:42.0839645Z  * [new tag]                 trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b -> trunk/7bc2a66ded06a0b2549aa51d807edc5dc3e73d1b
2025-12-04T09:33:42.0840488Z  * [new tag]                 trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9 -> trunk/7c648509a7470ace9fb2bae960dd4790f7e943e9
2025-12-04T09:33:42.0841487Z  * [new tag]                 trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8 -> trunk/7cbc2d034cecd21ab5c9707d0a9c525c17143fb8
2025-12-04T09:33:42.0842527Z  * [new tag]                 trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed -> trunk/7d1bbaf4ba301ea3fba6f3c7bc02d58f6417aaed
2025-12-04T09:33:42.0843713Z  * [new tag]                 trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8 -> trunk/7d2a33e4ebf60b217a3cd77feae19231eb996fc8
2025-12-04T09:33:42.0844664Z  * [new tag]                 trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e -> trunk/7eb625920054b1126a7d2d99818aaa188c6ba95e
2025-12-04T09:33:42.0845610Z  * [new tag]                 trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead -> trunk/7f55ba19c456a3d6cc443dd9edb6bb7cca677ead
2025-12-04T09:33:42.0846686Z  * [new tag]                 trunk/81af382128efa094d8702e18f2c133760904c718 -> trunk/81af382128efa094d8702e18f2c133760904c718
2025-12-04T09:33:42.0848106Z  * [new tag]                 trunk/84149583d483e9c973c9a0feda70e4f3964947b0 -> trunk/84149583d483e9c973c9a0feda70e4f3964947b0
2025-12-04T09:33:42.0849578Z  * [new tag]                 trunk/85a315917efe82c24306be805c584ec044951c75 -> trunk/85a315917efe82c24306be805c584ec044951c75
2025-12-04T09:33:42.0850557Z  * [new tag]                 trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece -> trunk/87329491c82a5f8c1cc4ec11d8f55a5de2551ece
2025-12-04T09:33:42.0851468Z  * [new tag]                 trunk/892640e25aeefa8007c5af837214b4502b6b62a6 -> trunk/892640e25aeefa8007c5af837214b4502b6b62a6
2025-12-04T09:33:42.0852854Z  * [new tag]                 trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4 -> trunk/89e3bbcb5b5321dc8b9520b4d5a8ee60cea1d0b4
2025-12-04T09:33:42.0853827Z  * [new tag]                 trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c -> trunk/8c73bbbb02159223c0c97d268a0a74cb78158a1c
2025-12-04T09:33:42.0854835Z  * [new tag]                 trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43 -> trunk/8d56e98c8db988a22cb2dfaeefb30bc7d2a3cc43
2025-12-04T09:33:42.0855953Z  * [new tag]                 trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922 -> trunk/8d9dd9603e5ee26c01007f0cd4f018e584840922
2025-12-04T09:33:42.0857048Z  * [new tag]                 trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca -> trunk/8ef0c0b02b062d75e7c9be2594914a3e784d23ca
2025-12-04T09:33:42.0858137Z  * [new tag]                 trunk/90b27e7e8352cde97d32ddad24740ef819633f38 -> trunk/90b27e7e8352cde97d32ddad24740ef819633f38
2025-12-04T09:33:42.0859032Z  * [new tag]                 trunk/90f0139e64b2951815d524b6a373bed20c4fbf90 -> trunk/90f0139e64b2951815d524b6a373bed20c4fbf90
2025-12-04T09:33:42.0859927Z  * [new tag]                 trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c -> trunk/93d0d6838c56af59b0dba794e6aa08f0c1c7799c
2025-12-04T09:33:42.0861029Z  * [new tag]                 trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87 -> trunk/94ca8d5f1e81fea3ae488650a0fb6795049a9f87
2025-12-04T09:33:42.0862079Z  * [new tag]                 trunk/9844fbeadd5cebdf1281d6fbf79164139c352693 -> trunk/9844fbeadd5cebdf1281d6fbf79164139c352693
2025-12-04T09:33:42.0863186Z  * [new tag]                 trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa -> trunk/99024dec888ec1e50b546822a32b6fb2f35e5eaa
2025-12-04T09:33:42.0864325Z  * [new tag]                 trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d -> trunk/9a296e640fc88aa44d275b48cd9cc30c573b169d
2025-12-04T09:33:42.0865398Z  * [new tag]                 trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639 -> trunk/9b3e34d8589b29f7b4e7fab6f78711b7ca6e4639
2025-12-04T09:33:42.0866455Z  * [new tag]                 trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8 -> trunk/9cd055e547e9b67a5f9827f8999c38d7eda1bcb8
2025-12-04T09:33:42.0867472Z  * [new tag]                 trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d -> trunk/9f0df5686cb4ada94f94620acba2e3c3f363b11d
2025-12-04T09:33:42.0868530Z  * [new tag]                 trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a -> trunk/9f7fceb887d0cfa0326a59b887821c63ff11340a
2025-12-04T09:33:42.0869643Z  * [new tag]                 trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742 -> trunk/9f8ef8855d3078d70f7b782540ff2aaf158d6742
2025-12-04T09:33:42.0870832Z  * [new tag]                 trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098 -> trunk/9fb52efc797b47a1f425a03aa5e47b866d8b1098
2025-12-04T09:33:42.0871862Z  * [new tag]                 trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa -> trunk/9ff4a2ebc5762d46c73e46b1b523d7ff349fedfa
2025-12-04T09:33:42.0873124Z  * [new tag]                 trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d -> trunk/a0f3937b94422354538ebbd47202d5b0e8a3fd0d
2025-12-04T09:33:42.0874610Z  * [new tag]                 trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c -> trunk/a15066c28b3145e6edbfc88359d0411d14cfc70c
2025-12-04T09:33:42.0875556Z  * [new tag]                 trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90 -> trunk/a20f775e82564d2a9979221ed7f3b8d7cf54ce90
2025-12-04T09:33:42.0876557Z  * [new tag]                 trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c -> trunk/a2973fb00ec002dd4b6bbf07385f066efb259b8c
2025-12-04T09:33:42.0877441Z  * [new tag]                 trunk/a7dc6dab9ad911259d4801c502907e531594db45 -> trunk/a7dc6dab9ad911259d4801c502907e531594db45
2025-12-04T09:33:42.0878599Z  * [new tag]                 trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109 -> trunk/a951a9cee65c01660bbc6e6fded90ecb10fa6109
2025-12-04T09:33:42.0879670Z  * [new tag]                 trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e -> trunk/abfa1a6d65c7c159e35c72c25979b9da4971689e
2025-12-04T09:33:42.0880810Z  * [new tag]                 trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e -> trunk/ae3a2395bf66151078e2d201716f7d63ce1c6f3e
2025-12-04T09:33:42.0881708Z  * [new tag]                 trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e -> trunk/afdff7f0325080dedac44d080cb5a3b0e65e6c5e
2025-12-04T09:33:42.0882775Z  * [new tag]                 trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48 -> trunk/b1aed4e7a72c03a38f44543aaea0dae2e9b76d48
2025-12-04T09:33:42.0883917Z  * [new tag]                 trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62 -> trunk/b1decff555cd50e2123c8c6e25cc0d447c411f62
2025-12-04T09:33:42.0885039Z  * [new tag]                 trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2 -> trunk/b2b6b034c9fd08672c40e63ef243556ad4c49bd2
2025-12-04T09:33:42.0886101Z  * [new tag]                 trunk/b39813b4a04931682b0491adba2138d01d716d99 -> trunk/b39813b4a04931682b0491adba2138d01d716d99
2025-12-04T09:33:42.0887200Z  * [new tag]                 trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24 -> trunk/b3a7edb2311367974cc7cd764cfb11a5d6758b24
2025-12-04T09:33:42.0888285Z  * [new tag]                 trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7 -> trunk/b4cc1329c86acaef6d42c1fac7169b8d870ab0d7
2025-12-04T09:33:42.0889385Z  * [new tag]                 trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a -> trunk/b555c39217f765759954a4f9f9bd1e9b87bed11a
2025-12-04T09:33:42.0890489Z  * [new tag]                 trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417 -> trunk/b6b6c80379388b7f9932c3e6a0f9907bf430e417
2025-12-04T09:33:42.0891523Z  * [new tag]                 trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4 -> trunk/b6b6d912df0b6f4082f8e50b18bd1de1dd7325f4
2025-12-04T09:33:42.0892748Z  * [new tag]                 trunk/b7d60685f8cbc939b68a20871e90db67e729329b -> trunk/b7d60685f8cbc939b68a20871e90db67e729329b
2025-12-04T09:33:42.0893858Z  * [new tag]                 trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e -> trunk/b7f6b9a4fc6259f7af068f31868b3119bb1bac3e
2025-12-04T09:33:42.0894999Z  * [new tag]                 trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf -> trunk/b8c4ba3593761e7b2a3ebd86f040fb07b47c02cf
2025-12-04T09:33:42.0895947Z  * [new tag]                 trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5 -> trunk/b9c8f3a4884befb965ff42620ce44a71b04887f5
2025-12-04T09:33:42.0896991Z  * [new tag]                 trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f -> trunk/ba1412546f3082c0958c077acc2025e4dbc33f1f
2025-12-04T09:33:42.0898077Z  * [new tag]                 trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f -> trunk/bac403c0b38c63bdbcc0c31f1c2b0bc0260f610f
2025-12-04T09:33:42.0899451Z  * [new tag]                 trunk/bb3034198b459401fabeab254e1b99f0115046e2 -> trunk/bb3034198b459401fabeab254e1b99f0115046e2
2025-12-04T09:33:42.0900395Z  * [new tag]                 trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55 -> trunk/bc39b2b3bc7a6e19a42e62bd576974035086fe55
2025-12-04T09:33:42.0901893Z  * [new tag]                 trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8 -> trunk/bc43d5b297f207a11d83d77ddf0152bdaabe15a8
2025-12-04T09:33:42.0902944Z  * [new tag]                 trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09 -> trunk/bc6a4863c7246a6493d16d4ea6eee71ec07c6a09
2025-12-04T09:33:42.0904050Z  * [new tag]                 trunk/bea4912944defdbcb8b061800caab6cbbbd01df5 -> trunk/bea4912944defdbcb8b061800caab6cbbbd01df5
2025-12-04T09:33:42.0905502Z  * [new tag]                 trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564 -> trunk/c04e2c656f48d82d1521b867bbbf03967b9b7564
2025-12-04T09:33:42.0906469Z  * [new tag]                 trunk/c0660bcee27e7d7731634e274576a7081882bede -> trunk/c0660bcee27e7d7731634e274576a7081882bede
2025-12-04T09:33:42.0907601Z  * [new tag]                 trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac -> trunk/c178ed43d3d99cbefe84fbfb21d6f282b20d62ac
2025-12-04T09:33:42.0915906Z  * [new tag]                 trunk/c55b1e8f61d041ee436d697449eb028931d574fb -> trunk/c55b1e8f61d041ee436d697449eb028931d574fb
2025-12-04T09:33:42.0916552Z  * [new tag]                 trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1 -> trunk/c6ae7579fe12fe75f1a8f7043a494c90567273f1
2025-12-04T09:33:42.0917168Z  * [new tag]                 trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0 -> trunk/c8210e7d94bad5ae21ac389fa4ba8a463c76c4d0
2025-12-04T09:33:42.0917925Z  * [new tag]                 trunk/cc0853af42122f8185321f542616f4474e717f09 -> trunk/cc0853af42122f8185321f542616f4474e717f09
2025-12-04T09:33:42.0918408Z  * [new tag]                 trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9 -> trunk/cddec6562eabfa390d014fa3741a5659cf9c94c9
2025-12-04T09:33:42.0918985Z  * [new tag]                 trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a -> trunk/ce5e7e3bf1f4b69a4f4f93d288ba75b906df492a
2025-12-04T09:33:42.0919474Z  * [new tag]                 trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace -> trunk/d038b0130ec7c20ebcac219301292fd8e98a1ace
2025-12-04T09:33:42.0920021Z  * [new tag]                 trunk/d16447dacaf2420ea175f0c275c75da951f57d39 -> trunk/d16447dacaf2420ea175f0c275c75da951f57d39
2025-12-04T09:33:42.0920535Z  * [new tag]                 trunk/d19f1e8cab6810bb2e99141f9976665954c67a50 -> trunk/d19f1e8cab6810bb2e99141f9976665954c67a50
2025-12-04T09:33:42.0921119Z  * [new tag]                 trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01 -> trunk/d1c9f03b2a5af4104721712f8cdffe9b4f340c01
2025-12-04T09:33:42.0921598Z  * [new tag]                 trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf -> trunk/d40f4950f2b7f7aa380a22fe0f6166e71680fbcf
2025-12-04T09:33:42.0922230Z  * [new tag]                 trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8 -> trunk/d5038950bacfe36bbf24a47a455fe76901deb8e8
2025-12-04T09:33:42.0922717Z  * [new tag]                 trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d -> trunk/d54ff42903c2ae0533931ff11d23b35f875bdb3d
2025-12-04T09:33:42.0923261Z  * [new tag]                 trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47 -> trunk/d76697633a2d2b9cced1ae21161849b33bfe7e47
2025-12-04T09:33:42.0923816Z  * [new tag]                 trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1 -> trunk/d78f52b199c547106d4cd9d2856dd0805c118bf1
2025-12-04T09:33:42.0924994Z  * [new tag]                 trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e -> trunk/d8fd5c6eed28e5004150691d048a3f6785e19a8e
2025-12-04T09:33:42.0926014Z  * [new tag]                 trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a -> trunk/d900f5e86745dec76713f4b0ef07005ef36b2f5a
2025-12-04T09:33:42.0927072Z  * [new tag]                 trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b -> trunk/d973dc6b87d763859fe1c5bd1287e3b6b1c49d1b
2025-12-04T09:33:42.0928224Z  * [new tag]                 trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec -> trunk/d998c03304cb6ede76e1ed535b4ddeb6c2bf40ec
2025-12-04T09:33:42.0929343Z  * [new tag]                 trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf -> trunk/d9cb8a70833101dbbe16b99520cfbdd70d0a87bf
2025-12-04T09:33:42.0930389Z  * [new tag]                 trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd -> trunk/d9d5e91b43f70eb8637af55db6856d49be391ffd
2025-12-04T09:33:42.0931449Z  * [new tag]                 trunk/dd18a75336a4fbd7497955cc5665904724fce889 -> trunk/dd18a75336a4fbd7497955cc5665904724fce889
2025-12-04T09:33:42.0932493Z  * [new tag]                 trunk/ded9bcd61a059bf723e6e84689552962b480ea77 -> trunk/ded9bcd61a059bf723e6e84689552962b480ea77
2025-12-04T09:33:42.0933616Z  * [new tag]                 trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c -> trunk/dfbd3714d15c37a7b83b322a6b60f997fc00f50c
2025-12-04T09:33:42.0935008Z  * [new tag]                 trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b -> trunk/e115f9f4e4b039f8e9a642aaa2bd8254a920541b
2025-12-04T09:33:42.0935823Z  * [new tag]                 trunk/e3f24fd73ad74c6e7176687986436956c7c18235 -> trunk/e3f24fd73ad74c6e7176687986436956c7c18235
2025-12-04T09:33:42.0936966Z  * [new tag]                 trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e -> trunk/e7d24d3ff93d1503ba63860b7057438ad93f918e
2025-12-04T09:33:42.0938096Z  * [new tag]                 trunk/ea7035f462a0d2830865ee86c832bd101e1427fc -> trunk/ea7035f462a0d2830865ee86c832bd101e1427fc
2025-12-04T09:33:42.0939264Z  * [new tag]                 trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3 -> trunk/eabb7ad2128580ef674446027b95bcf4e21e8df3
2025-12-04T09:33:42.0940353Z  * [new tag]                 trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf -> trunk/eb5c63652a33da42e7018c23df5f20a3eb4c6ccf
2025-12-04T09:33:42.0941517Z  * [new tag]                 trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e -> trunk/ec2c71f5c85021b8938cdafadce24c15a36fd93e
2025-12-04T09:33:42.0942499Z  * [new tag]                 trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e -> trunk/ecbcc3f6bf327856b435b259ac63cc2f328c4b4e
2025-12-04T09:33:42.0944077Z  * [new tag]                 trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2 -> trunk/ee87bbe876c42575e961b32a0827d76bc9782ca2
2025-12-04T09:33:42.0945587Z  * [new tag]                 trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4 -> trunk/ef019d1d431c4c5a95b594cb90d40a50cd00f5e4
2025-12-04T09:33:42.0946727Z  * [new tag]                 trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53 -> trunk/ef8ecc13830a86c4b231f1aad9aba7851db61b53
2025-12-04T09:33:42.0947754Z  * [new tag]                 trunk/f1076f5510920044912247b1abb8760cb820f598 -> trunk/f1076f5510920044912247b1abb8760cb820f598
2025-12-04T09:33:42.0948803Z  * [new tag]                 trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40 -> trunk/f2d6a75a00a1d648ca9a0abc6a33e14c3dea6c40
2025-12-04T09:33:42.0949912Z  * [new tag]                 trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56 -> trunk/f47dd0ddef1359e5b43e4b962412f67b30ecde56
2025-12-04T09:33:42.0950958Z  * [new tag]                 trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8 -> trunk/f49d32dfa4730dcfb1b60eeeb369b5889da983c8
2025-12-04T09:33:42.0951963Z  * [new tag]                 trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467 -> trunk/f4dedf78fc30fd4b93975787ca6074ee89db9467
2025-12-04T09:33:42.0953022Z  * [new tag]                 trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17 -> trunk/f7c0d03819ebed05c4038f095d66d1b8c54aca17
2025-12-04T09:33:42.0954149Z  * [new tag]                 trunk/f7e1bd80a063e17453c361837ba6ea2570920a73 -> trunk/f7e1bd80a063e17453c361837ba6ea2570920a73
2025-12-04T09:33:42.0955080Z  * [new tag]                 trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7 -> trunk/f9bd6c53624c7c0ea3772de78498326e84c2f0e7
2025-12-04T09:33:42.0956229Z  * [new tag]                 trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b -> trunk/fb5be221a46b51bfc9509013b0d85bc5a9d4f15b
2025-12-04T09:33:42.0957339Z  * [new tag]                 trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7 -> trunk/fdf863d5e1de3b2688c9511e96876e34581dbfd7
2025-12-04T09:33:42.0959028Z  * [new tag]                 trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307 -> trunk/fe0e65adfc0e7ca6e5f57e6ea8b16bd5cc967307
2025-12-04T09:33:42.0960084Z  * [new tag]                 trunk/fec710bf89173f5355468a7ce1afe9157c3d9009 -> trunk/fec710bf89173f5355468a7ce1afe9157c3d9009
2025-12-04T09:33:42.0961300Z  * [new tag]                 trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32 -> trunk/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:42.0962040Z  * [new tag]                 v0.1.1                      -> v0.1.1
2025-12-04T09:33:42.0963233Z  * [new tag]                 v0.1.10                     -> v0.1.10
2025-12-04T09:33:42.0964142Z  * [new tag]                 v0.1.11                     -> v0.1.11
2025-12-04T09:33:42.0965224Z  * [new tag]                 v0.1.12                     -> v0.1.12
2025-12-04T09:33:42.0966152Z  * [new tag]                 v0.1.2                      -> v0.1.2
2025-12-04T09:33:42.0967192Z  * [new tag]                 v0.1.3                      -> v0.1.3
2025-12-04T09:33:42.0967997Z  * [new tag]                 v0.1.4                      -> v0.1.4
2025-12-04T09:33:42.0969037Z  * [new tag]                 v0.1.5                      -> v0.1.5
2025-12-04T09:33:42.0970115Z  * [new tag]                 v0.1.6                      -> v0.1.6
2025-12-04T09:33:42.0971068Z  * [new tag]                 v0.1.7                      -> v0.1.7
2025-12-04T09:33:42.0971931Z  * [new tag]                 v0.1.8                      -> v0.1.8
2025-12-04T09:33:42.0972963Z  * [new tag]                 v0.1.9                      -> v0.1.9
2025-12-04T09:33:42.0973974Z  * [new tag]                 v0.2.0                      -> v0.2.0
2025-12-04T09:33:42.0975070Z  * [new tag]                 v0.3.0                      -> v0.3.0
2025-12-04T09:33:42.0976138Z  * [new tag]                 v0.3.1                      -> v0.3.1
2025-12-04T09:33:42.0977035Z  * [new tag]                 v0.4.0                      -> v0.4.0
2025-12-04T09:33:42.0978060Z  * [new tag]                 v0.4.1                      -> v0.4.1
2025-12-04T09:33:42.0979116Z  * [new tag]                 v1.0.0                      -> v1.0.0
2025-12-04T09:33:42.0980001Z  * [new tag]                 v1.0.0a0                    -> v1.0.0a0
2025-12-04T09:33:42.0981072Z  * [new tag]                 v1.0.1                      -> v1.0.1
2025-12-04T09:33:42.0982142Z  * [new tag]                 v1.0rc0                     -> v1.0rc0
2025-12-04T09:33:42.0982924Z  * [new tag]                 v1.0rc1                     -> v1.0rc1
2025-12-04T09:33:42.0983855Z  * [new tag]                 v1.1.0                      -> v1.1.0
2025-12-04T09:33:42.0984959Z  * [new tag]                 v1.1.0a0                    -> v1.1.0a0
2025-12-04T09:33:42.0986186Z  * [new tag]                 v1.10.0                     -> v1.10.0
2025-12-04T09:33:42.0987327Z  * [new tag]                 v1.10.0-rc1                 -> v1.10.0-rc1
2025-12-04T09:33:42.0988269Z  * [new tag]                 v1.10.0-rc2                 -> v1.10.0-rc2
2025-12-04T09:33:42.0989053Z  * [new tag]                 v1.10.0-rc3                 -> v1.10.0-rc3
2025-12-04T09:33:42.0990144Z  * [new tag]                 v1.10.1                     -> v1.10.1
2025-12-04T09:33:42.0990959Z  * [new tag]                 v1.10.1-rc1                 -> v1.10.1-rc1
2025-12-04T09:33:42.0991695Z  * [new tag]                 v1.10.2                     -> v1.10.2
2025-12-04T09:33:42.0992519Z  * [new tag]                 v1.10.2-rc1                 -> v1.10.2-rc1
2025-12-04T09:33:42.0993602Z  * [new tag]                 v1.11.0                     -> v1.11.0
2025-12-04T09:33:42.0995286Z  * [new tag]                 v1.11.0-rc1                 -> v1.11.0-rc1
2025-12-04T09:33:42.0996432Z  * [new tag]                 v1.11.0-rc2                 -> v1.11.0-rc2
2025-12-04T09:33:42.0997541Z  * [new tag]                 v1.11.0-rc3                 -> v1.11.0-rc3
2025-12-04T09:33:42.0998647Z  * [new tag]                 v1.11.0-rc4                 -> v1.11.0-rc4
2025-12-04T09:33:42.0999752Z  * [new tag]                 v1.11.0-rc5                 -> v1.11.0-rc5
2025-12-04T09:33:42.1000423Z  * [new tag]                 v1.11.0-rc6                 -> v1.11.0-rc6
2025-12-04T09:33:42.1001427Z  * [new tag]                 v1.11.0-rc7                 -> v1.11.0-rc7
2025-12-04T09:33:42.1002978Z  * [new tag]                 v1.12.0                     -> v1.12.0
2025-12-04T09:33:42.1003923Z  * [new tag]                 v1.12.0-rc1                 -> v1.12.0-rc1
2025-12-04T09:33:42.1004994Z  * [new tag]                 v1.12.0-rc2                 -> v1.12.0-rc2
2025-12-04T09:33:42.1006059Z  * [new tag]                 v1.12.0-rc3                 -> v1.12.0-rc3
2025-12-04T09:33:42.1007123Z  * [new tag]                 v1.12.0-rc4                 -> v1.12.0-rc4
2025-12-04T09:33:42.1008164Z  * [new tag]                 v1.12.0-rc5                 -> v1.12.0-rc5
2025-12-04T09:33:42.1009336Z  * [new tag]                 v1.12.0-rc6                 -> v1.12.0-rc6
2025-12-04T09:33:42.1010099Z  * [new tag]                 v1.12.0-rc7                 -> v1.12.0-rc7
2025-12-04T09:33:42.1010908Z  * [new tag]                 v1.12.0-rc8                 -> v1.12.0-rc8
2025-12-04T09:33:42.1011676Z  * [new tag]                 v1.12.1                     -> v1.12.1
2025-12-04T09:33:42.1012888Z  * [new tag]                 v1.12.1-rc1                 -> v1.12.1-rc1
2025-12-04T09:33:42.1013957Z  * [new tag]                 v1.12.1-rc2                 -> v1.12.1-rc2
2025-12-04T09:33:42.1015072Z  * [new tag]                 v1.12.1-rc3                 -> v1.12.1-rc3
2025-12-04T09:33:42.1016102Z  * [new tag]                 v1.12.1-rc4                 -> v1.12.1-rc4
2025-12-04T09:33:42.1016873Z  * [new tag]                 v1.12.1-rc5                 -> v1.12.1-rc5
2025-12-04T09:33:42.1017989Z  * [new tag]                 v1.13.0                     -> v1.13.0
2025-12-04T09:33:42.1018949Z  * [new tag]                 v1.13.0-rc1                 -> v1.13.0-rc1
2025-12-04T09:33:42.1019990Z  * [new tag]                 v1.13.0-rc2                 -> v1.13.0-rc2
2025-12-04T09:33:42.1021081Z  * [new tag]                 v1.13.0-rc3                 -> v1.13.0-rc3
2025-12-04T09:33:42.1022202Z  * [new tag]                 v1.13.0-rc4                 -> v1.13.0-rc4
2025-12-04T09:33:42.1022981Z  * [new tag]                 v1.13.0-rc5                 -> v1.13.0-rc5
2025-12-04T09:33:42.1023748Z  * [new tag]                 v1.13.0-rc6                 -> v1.13.0-rc6
2025-12-04T09:33:42.1024887Z  * [new tag]                 v1.13.1                     -> v1.13.1
2025-12-04T09:33:42.1025641Z  * [new tag]                 v1.13.1-rc1                 -> v1.13.1-rc1
2025-12-04T09:33:42.1026701Z  * [new tag]                 v1.2.0                      -> v1.2.0
2025-12-04T09:33:42.1027756Z  * [new tag]                 v1.2.0a0                    -> v1.2.0a0
2025-12-04T09:33:42.1028669Z  * [new tag]                 v1.3.0                      -> v1.3.0
2025-12-04T09:33:42.1029765Z  * [new tag]                 v1.3.0a0                    -> v1.3.0a0
2025-12-04T09:33:42.1030549Z  * [new tag]                 v1.3.1                      -> v1.3.1
2025-12-04T09:33:42.1031592Z  * [new tag]                 v1.4.0                      -> v1.4.0
2025-12-04T09:33:42.1032509Z  * [new tag]                 v1.4.0a0                    -> v1.4.0a0
2025-12-04T09:33:42.1033307Z  * [new tag]                 v1.4.1                      -> v1.4.1
2025-12-04T09:33:42.1034544Z  * [new tag]                 v1.5.0                      -> v1.5.0
2025-12-04T09:33:42.1035681Z  * [new tag]                 v1.5.0-rc1                  -> v1.5.0-rc1
2025-12-04T09:33:42.1036772Z  * [new tag]                 v1.5.0-rc2                  -> v1.5.0-rc2
2025-12-04T09:33:42.1037883Z  * [new tag]                 v1.5.0-rc3                  -> v1.5.0-rc3
2025-12-04T09:33:42.1038813Z  * [new tag]                 v1.5.0-rc4                  -> v1.5.0-rc4
2025-12-04T09:33:42.1039606Z  * [new tag]                 v1.5.0-rc5                  -> v1.5.0-rc5
2025-12-04T09:33:42.1040711Z  * [new tag]                 v1.5.1                      -> v1.5.1
2025-12-04T09:33:42.1041516Z  * [new tag]                 v1.5.1-rc1                  -> v1.5.1-rc1
2025-12-04T09:33:42.1042383Z  * [new tag]                 v1.6.0                      -> v1.6.0
2025-12-04T09:33:42.1043585Z  * [new tag]                 v1.6.0-rc1                  -> v1.6.0-rc1
2025-12-04T09:33:42.1044899Z  * [new tag]                 v1.6.0-rc2                  -> v1.6.0-rc2
2025-12-04T09:33:42.1045957Z  * [new tag]                 v1.6.0-rc3                  -> v1.6.0-rc3
2025-12-04T09:33:42.1047029Z  * [new tag]                 v1.6.0-rc4                  -> v1.6.0-rc4
2025-12-04T09:33:42.1047978Z  * [new tag]                 v1.6.0-rc5                  -> v1.6.0-rc5
2025-12-04T09:33:42.1049046Z  * [new tag]                 v1.6.0-rc6                  -> v1.6.0-rc6
2025-12-04T09:33:42.1049787Z  * [new tag]                 v1.6.0-rc7                  -> v1.6.0-rc7
2025-12-04T09:33:42.1050937Z  * [new tag]                 v1.7.0                      -> v1.7.0
2025-12-04T09:33:42.1052046Z  * [new tag]                 v1.7.0-rc1                  -> v1.7.0-rc1
2025-12-04T09:33:42.1053149Z  * [new tag]                 v1.7.0-rc2                  -> v1.7.0-rc2
2025-12-04T09:33:42.1054217Z  * [new tag]                 v1.7.0-rc3                  -> v1.7.0-rc3
2025-12-04T09:33:42.1054984Z  * [new tag]                 v1.7.0-rc4                  -> v1.7.0-rc4
2025-12-04T09:33:42.1056095Z  * [new tag]                 v1.7.1                      -> v1.7.1
2025-12-04T09:33:42.1057230Z  * [new tag]                 v1.7.1-rc1                  -> v1.7.1-rc1
2025-12-04T09:33:42.1058371Z  * [new tag]                 v1.7.1-rc2                  -> v1.7.1-rc2
2025-12-04T09:33:42.1059146Z  * [new tag]                 v1.7.1-rc3                  -> v1.7.1-rc3
2025-12-04T09:33:42.1060680Z  * [new tag]                 v1.8.0                      -> v1.8.0
2025-12-04T09:33:42.1061540Z  * [new tag]                 v1.8.0-rc1                  -> v1.8.0-rc1
2025-12-04T09:33:42.1062636Z  * [new tag]                 v1.8.0-rc2                  -> v1.8.0-rc2
2025-12-04T09:33:42.1063732Z  * [new tag]                 v1.8.0-rc3                  -> v1.8.0-rc3
2025-12-04T09:33:42.1064653Z  * [new tag]                 v1.8.0-rc4                  -> v1.8.0-rc4
2025-12-04T09:33:42.1065442Z  * [new tag]                 v1.8.0-rc5                  -> v1.8.0-rc5
2025-12-04T09:33:42.1066267Z  * [new tag]                 v1.8.1                      -> v1.8.1
2025-12-04T09:33:42.1067487Z  * [new tag]                 v1.8.1-rc1                  -> v1.8.1-rc1
2025-12-04T09:33:42.1068284Z  * [new tag]                 v1.8.1-rc2                  -> v1.8.1-rc2
2025-12-04T09:33:42.1069091Z  * [new tag]                 v1.8.1-rc3                  -> v1.8.1-rc3
2025-12-04T09:33:42.1070705Z  * [new tag]                 v1.8.2                      -> v1.8.2
2025-12-04T09:33:42.1071547Z  * [new tag]                 v1.8.2-rc1                  -> v1.8.2-rc1
2025-12-04T09:33:42.1072630Z  * [new tag]                 v1.9.0                      -> v1.9.0
2025-12-04T09:33:42.1073718Z  * [new tag]                 v1.9.0-rc1                  -> v1.9.0-rc1
2025-12-04T09:33:42.1074847Z  * [new tag]                 v1.9.0-rc2                  -> v1.9.0-rc2
2025-12-04T09:33:42.1075950Z  * [new tag]                 v1.9.0-rc3                  -> v1.9.0-rc3
2025-12-04T09:33:42.1076717Z  * [new tag]                 v1.9.0-rc4                  -> v1.9.0-rc4
2025-12-04T09:33:42.1077824Z  * [new tag]                 v1.9.1                      -> v1.9.1
2025-12-04T09:33:42.1079085Z  * [new tag]                 v1.9.1-rc1                  -> v1.9.1-rc1
2025-12-04T09:33:42.1079876Z  * [new tag]                 v1.9.1-rc2                  -> v1.9.1-rc2
2025-12-04T09:33:42.1081022Z  * [new tag]                 v2.0.0                      -> v2.0.0
2025-12-04T09:33:42.1081974Z  * [new tag]                 v2.0.0-rc1                  -> v2.0.0-rc1
2025-12-04T09:33:42.1083213Z  * [new tag]                 v2.0.0-rc2                  -> v2.0.0-rc2
2025-12-04T09:33:42.1084366Z  * [new tag]                 v2.0.0-rc3                  -> v2.0.0-rc3
2025-12-04T09:33:42.1085285Z  * [new tag]                 v2.0.0-rc4                  -> v2.0.0-rc4
2025-12-04T09:33:42.1086404Z  * [new tag]                 v2.0.0-rc5                  -> v2.0.0-rc5
2025-12-04T09:33:42.1087246Z  * [new tag]                 v2.0.0-rc6                  -> v2.0.0-rc6
2025-12-04T09:33:42.1088373Z  * [new tag]                 v2.0.1                      -> v2.0.1
2025-12-04T09:33:42.1089472Z  * [new tag]                 v2.0.1-rc1                  -> v2.0.1-rc1
2025-12-04T09:33:42.1090220Z  * [new tag]                 v2.0.1-rc2                  -> v2.0.1-rc2
2025-12-04T09:33:42.1091252Z  * [new tag]                 v2.0.1-rc3                  -> v2.0.1-rc3
2025-12-04T09:33:42.1092004Z  * [new tag]                 v2.0.1-rc4                  -> v2.0.1-rc4
2025-12-04T09:33:42.1093669Z  * [new tag]                 v2.1.0                      -> v2.1.0
2025-12-04T09:33:42.1094746Z  * [new tag]                 v2.1.0-rc1                  -> v2.1.0-rc1
2025-12-04T09:33:42.1095852Z  * [new tag]                 v2.1.0-rc2                  -> v2.1.0-rc2
2025-12-04T09:33:42.1096999Z  * [new tag]                 v2.1.0-rc3                  -> v2.1.0-rc3
2025-12-04T09:33:42.1098075Z  * [new tag]                 v2.1.0-rc4                  -> v2.1.0-rc4
2025-12-04T09:33:42.1099149Z  * [new tag]                 v2.1.0-rc5                  -> v2.1.0-rc5
2025-12-04T09:33:42.1099871Z  * [new tag]                 v2.1.0-rc6                  -> v2.1.0-rc6
2025-12-04T09:33:42.1101328Z  * [new tag]                 v2.1.1                      -> v2.1.1
2025-12-04T09:33:42.1102630Z  * [new tag]                 v2.1.1-rc1                  -> v2.1.1-rc1
2025-12-04T09:33:42.1103680Z  * [new tag]                 v2.1.1-rc2                  -> v2.1.1-rc2
2025-12-04T09:33:42.1104848Z  * [new tag]                 v2.1.1-rc3                  -> v2.1.1-rc3
2025-12-04T09:33:42.1105950Z  * [new tag]                 v2.1.1-rc4                  -> v2.1.1-rc4
2025-12-04T09:33:42.1106885Z  * [new tag]                 v2.1.1-rc5                  -> v2.1.1-rc5
2025-12-04T09:33:42.1107683Z  * [new tag]                 v2.1.1-rc6                  -> v2.1.1-rc6
2025-12-04T09:33:42.1108757Z  * [new tag]                 v2.1.2                      -> v2.1.2
2025-12-04T09:33:42.1109962Z  * [new tag]                 v2.1.2-rc1                  -> v2.1.2-rc1
2025-12-04T09:33:42.1111067Z  * [new tag]                 v2.1.2-rc2                  -> v2.1.2-rc2
2025-12-04T09:33:42.1111860Z  * [new tag]                 v2.1.2-rc3                  -> v2.1.2-rc3
2025-12-04T09:33:42.1112972Z  * [new tag]                 v2.2.0                      -> v2.2.0
2025-12-04T09:33:42.1114035Z  * [new tag]                 v2.2.0-rc1                  -> v2.2.0-rc1
2025-12-04T09:33:42.1114985Z  * [new tag]                 v2.2.0-rc2                  -> v2.2.0-rc2
2025-12-04T09:33:42.1116071Z  * [new tag]                 v2.2.0-rc3                  -> v2.2.0-rc3
2025-12-04T09:33:42.1117027Z  * [new tag]                 v2.2.0-rc4                  -> v2.2.0-rc4
2025-12-04T09:33:42.1118092Z  * [new tag]                 v2.2.0-rc5                  -> v2.2.0-rc5
2025-12-04T09:33:42.1119041Z  * [new tag]                 v2.2.0-rc6                  -> v2.2.0-rc6
2025-12-04T09:33:42.1119844Z  * [new tag]                 v2.2.0-rc7                  -> v2.2.0-rc7
2025-12-04T09:33:42.1120648Z  * [new tag]                 v2.2.0-rc8                  -> v2.2.0-rc8
2025-12-04T09:33:42.1121839Z  * [new tag]                 v2.2.1                      -> v2.2.1
2025-12-04T09:33:42.1123179Z  * [new tag]                 v2.2.1-rc1                  -> v2.2.1-rc1
2025-12-04T09:33:42.1123955Z  * [new tag]                 v2.2.1-rc2                  -> v2.2.1-rc2
2025-12-04T09:33:42.1124745Z  * [new tag]                 v2.2.1-rc3                  -> v2.2.1-rc3
2025-12-04T09:33:42.1125592Z  * [new tag]                 v2.2.2                      -> v2.2.2
2025-12-04T09:33:42.1127279Z  * [new tag]                 v2.2.2-rc1                  -> v2.2.2-rc1
2025-12-04T09:33:42.1128102Z  * [new tag]                 v2.2.2-rc2                  -> v2.2.2-rc2
2025-12-04T09:33:42.1128929Z  * [new tag]                 v2.2.2-rc3                  -> v2.2.2-rc3
2025-12-04T09:33:42.1130217Z  * [new tag]                 v2.3.0                      -> v2.3.0
2025-12-04T09:33:42.1131172Z  * [new tag]                 v2.3.0-rc1                  -> v2.3.0-rc1
2025-12-04T09:33:42.1132275Z  * [new tag]                 v2.3.0-rc10                 -> v2.3.0-rc10
2025-12-04T09:33:42.1133426Z  * [new tag]                 v2.3.0-rc11                 -> v2.3.0-rc11
2025-12-04T09:33:42.1134368Z  * [new tag]                 v2.3.0-rc12                 -> v2.3.0-rc12
2025-12-04T09:33:42.1135437Z  * [new tag]                 v2.3.0-rc2                  -> v2.3.0-rc2
2025-12-04T09:33:42.1136577Z  * [new tag]                 v2.3.0-rc3                  -> v2.3.0-rc3
2025-12-04T09:33:42.1137601Z  * [new tag]                 v2.3.0-rc4                  -> v2.3.0-rc4
2025-12-04T09:33:42.1138686Z  * [new tag]                 v2.3.0-rc5                  -> v2.3.0-rc5
2025-12-04T09:33:42.1139437Z  * [new tag]                 v2.3.0-rc6                  -> v2.3.0-rc6
2025-12-04T09:33:42.1140586Z  * [new tag]                 v2.3.0-rc7                  -> v2.3.0-rc7
2025-12-04T09:33:42.1141646Z  * [new tag]                 v2.3.0-rc8                  -> v2.3.0-rc8
2025-12-04T09:33:42.1142388Z  * [new tag]                 v2.3.0-rc9                  -> v2.3.0-rc9
2025-12-04T09:33:42.1143184Z  * [new tag]                 v2.3.1                      -> v2.3.1
2025-12-04T09:33:42.1144359Z  * [new tag]                 v2.3.1-rc1                  -> v2.3.1-rc1
2025-12-04T09:33:42.1145434Z  * [new tag]                 v2.3.1-rc2                  -> v2.3.1-rc2
2025-12-04T09:33:42.1146549Z  * [new tag]                 v2.3.1-rc3                  -> v2.3.1-rc3
2025-12-04T09:33:42.1147608Z  * [new tag]                 v2.4.0                      -> v2.4.0
2025-12-04T09:33:42.1148701Z  * [new tag]                 v2.4.0-rc1                  -> v2.4.0-rc1
2025-12-04T09:33:42.1149634Z  * [new tag]                 v2.4.0-rc2                  -> v2.4.0-rc2
2025-12-04T09:33:42.1150720Z  * [new tag]                 v2.4.0-rc3                  -> v2.4.0-rc3
2025-12-04T09:33:42.1151823Z  * [new tag]                 v2.4.0-rc4                  -> v2.4.0-rc4
2025-12-04T09:33:42.1152933Z  * [new tag]                 v2.4.0-rc5                  -> v2.4.0-rc5
2025-12-04T09:33:42.1154032Z  * [new tag]                 v2.4.0-rc6                  -> v2.4.0-rc6
2025-12-04T09:33:42.1155113Z  * [new tag]                 v2.4.0-rc7                  -> v2.4.0-rc7
2025-12-04T09:33:42.1156201Z  * [new tag]                 v2.4.0-rc8                  -> v2.4.0-rc8
2025-12-04T09:33:42.1157286Z  * [new tag]                 v2.4.0-rc9                  -> v2.4.0-rc9
2025-12-04T09:33:42.1158074Z  * [new tag]                 v2.4.1                      -> v2.4.1
2025-12-04T09:33:42.1159268Z  * [new tag]                 v2.4.1-rc1                  -> v2.4.1-rc1
2025-12-04T09:33:42.1160408Z  * [new tag]                 v2.4.1-rc2                  -> v2.4.1-rc2
2025-12-04T09:33:42.1161531Z  * [new tag]                 v2.4.1-rc3                  -> v2.4.1-rc3
2025-12-04T09:33:42.1162695Z  * [new tag]                 v2.5.0                      -> v2.5.0
2025-12-04T09:33:42.1163827Z  * [new tag]                 v2.5.0-rc1                  -> v2.5.0-rc1
2025-12-04T09:33:42.1164590Z  * [new tag]                 v2.5.0-rc10                 -> v2.5.0-rc10
2025-12-04T09:33:42.1165678Z  * [new tag]                 v2.5.0-rc2                  -> v2.5.0-rc2
2025-12-04T09:33:42.1166794Z  * [new tag]                 v2.5.0-rc3                  -> v2.5.0-rc3
2025-12-04T09:33:42.1167915Z  * [new tag]                 v2.5.0-rc4                  -> v2.5.0-rc4
2025-12-04T09:33:42.1168981Z  * [new tag]                 v2.5.0-rc5                  -> v2.5.0-rc5
2025-12-04T09:33:42.1170115Z  * [new tag]                 v2.5.0-rc6                  -> v2.5.0-rc6
2025-12-04T09:33:42.1171142Z  * [new tag]                 v2.5.0-rc7                  -> v2.5.0-rc7
2025-12-04T09:33:42.1172241Z  * [new tag]                 v2.5.0-rc8                  -> v2.5.0-rc8
2025-12-04T09:33:42.1173412Z  * [new tag]                 v2.5.0-rc9                  -> v2.5.0-rc9
2025-12-04T09:33:42.1174119Z  * [new tag]                 v2.5.1                      -> v2.5.1
2025-12-04T09:33:42.1174887Z  * [new tag]                 v2.5.1-rc1                  -> v2.5.1-rc1
2025-12-04T09:33:42.1175737Z  * [new tag]                 v2.6.0                      -> v2.6.0
2025-12-04T09:33:42.1176919Z  * [new tag]                 v2.6.0-rc1                  -> v2.6.0-rc1
2025-12-04T09:33:42.1178093Z  * [new tag]                 v2.6.0-rc2                  -> v2.6.0-rc2
2025-12-04T09:33:42.1179242Z  * [new tag]                 v2.6.0-rc3                  -> v2.6.0-rc3
2025-12-04T09:33:42.1180188Z  * [new tag]                 v2.6.0-rc4                  -> v2.6.0-rc4
2025-12-04T09:33:42.1181526Z  * [new tag]                 v2.6.0-rc5                  -> v2.6.0-rc5
2025-12-04T09:33:42.1182737Z  * [new tag]                 v2.6.0-rc6                  -> v2.6.0-rc6
2025-12-04T09:33:42.1183864Z  * [new tag]                 v2.6.0-rc7                  -> v2.6.0-rc7
2025-12-04T09:33:42.1185087Z  * [new tag]                 v2.6.0-rc8                  -> v2.6.0-rc8
2025-12-04T09:33:42.1186181Z  * [new tag]                 v2.6.0-rc9                  -> v2.6.0-rc9
2025-12-04T09:33:42.1187441Z  * [new tag]                 v2.7.0                      -> v2.7.0
2025-12-04T09:33:42.1188484Z  * [new tag]                 v2.7.0-rc1                  -> v2.7.0-rc1
2025-12-04T09:33:42.1189363Z  * [new tag]                 v2.7.0-rc10                 -> v2.7.0-rc10
2025-12-04T09:33:42.1190558Z  * [new tag]                 v2.7.0-rc2                  -> v2.7.0-rc2
2025-12-04T09:33:42.1191746Z  * [new tag]                 v2.7.0-rc3                  -> v2.7.0-rc3
2025-12-04T09:33:42.1192862Z  * [new tag]                 v2.7.0-rc4                  -> v2.7.0-rc4
2025-12-04T09:33:42.1193903Z  * [new tag]                 v2.7.0-rc5                  -> v2.7.0-rc5
2025-12-04T09:33:42.1195432Z  * [new tag]                 v2.7.0-rc6                  -> v2.7.0-rc6
2025-12-04T09:33:42.1196566Z  * [new tag]                 v2.7.0-rc7                  -> v2.7.0-rc7
2025-12-04T09:33:42.1197713Z  * [new tag]                 v2.7.0-rc8                  -> v2.7.0-rc8
2025-12-04T09:33:42.1198904Z  * [new tag]                 v2.7.0-rc9                  -> v2.7.0-rc9
2025-12-04T09:33:42.1199669Z  * [new tag]                 v2.7.1                      -> v2.7.1
2025-12-04T09:33:42.1200966Z  * [new tag]                 v2.7.1-rc1                  -> v2.7.1-rc1
2025-12-04T09:33:42.1205448Z  * [new tag]                 v2.7.1-rc2                  -> v2.7.1-rc2
2025-12-04T09:33:42.1206770Z  * [new tag]                 v2.7.1-rc3                  -> v2.7.1-rc3
2025-12-04T09:33:42.1207927Z  * [new tag]                 v2.7.1-rc4                  -> v2.7.1-rc4
2025-12-04T09:33:42.1208997Z  * [new tag]                 v2.7.1-rc5                  -> v2.7.1-rc5
2025-12-04T09:33:42.1209843Z  * [new tag]                 v2.8.0                      -> v2.8.0
2025-12-04T09:33:42.1211030Z  * [new tag]                 v2.8.0-rc1                  -> v2.8.0-rc1
2025-12-04T09:33:42.1212112Z  * [new tag]                 v2.8.0-rc2                  -> v2.8.0-rc2
2025-12-04T09:33:42.1213428Z  * [new tag]                 v2.8.0-rc3                  -> v2.8.0-rc3
2025-12-04T09:33:42.1214608Z  * [new tag]                 v2.8.0-rc4                  -> v2.8.0-rc4
2025-12-04T09:33:42.1215779Z  * [new tag]                 v2.8.0-rc5                  -> v2.8.0-rc5
2025-12-04T09:33:42.1216926Z  * [new tag]                 v2.8.0-rc6                  -> v2.8.0-rc6
2025-12-04T09:33:42.1218034Z  * [new tag]                 v2.8.0-rc7                  -> v2.8.0-rc7
2025-12-04T09:33:42.1219117Z  * [new tag]                 v2.8.0-rc8                  -> v2.8.0-rc8
2025-12-04T09:33:42.1220264Z  * [new tag]                 v2.9.0                      -> v2.9.0
2025-12-04T09:33:42.1221382Z  * [new tag]                 v2.9.0-rc1                  -> v2.9.0-rc1
2025-12-04T09:33:42.1222651Z  * [new tag]                 v2.9.0-rc10                 -> v2.9.0-rc10
2025-12-04T09:33:42.1223617Z  * [new tag]                 v2.9.0-rc11                 -> v2.9.0-rc11
2025-12-04T09:33:42.1225049Z  * [new tag]                 v2.9.0-rc2                  -> v2.9.0-rc2
2025-12-04T09:33:42.1226178Z  * [new tag]                 v2.9.0-rc3                  -> v2.9.0-rc3
2025-12-04T09:33:42.1227328Z  * [new tag]                 v2.9.0-rc4                  -> v2.9.0-rc4
2025-12-04T09:33:42.1228447Z  * [new tag]                 v2.9.0-rc5                  -> v2.9.0-rc5
2025-12-04T09:33:42.1229770Z  * [new tag]                 v2.9.0-rc6                  -> v2.9.0-rc6
2025-12-04T09:33:42.1230944Z  * [new tag]                 v2.9.0-rc7                  -> v2.9.0-rc7
2025-12-04T09:33:42.1232203Z  * [new tag]                 v2.9.0-rc8                  -> v2.9.0-rc8
2025-12-04T09:33:42.1233051Z  * [new tag]                 v2.9.0-rc9                  -> v2.9.0-rc9
2025-12-04T09:33:42.1233883Z  * [new tag]                 v2.9.1                      -> v2.9.1
2025-12-04T09:33:42.1235043Z  * [new tag]                 v2.9.1-rc1                  -> v2.9.1-rc1
2025-12-04T09:33:42.1236252Z  * [new tag]                 v2.9.1-rc2                  -> v2.9.1-rc2
2025-12-04T09:33:42.1237739Z  * [new tag]                 viable/strict/1759343184    -> viable/strict/1759343184
2025-12-04T09:33:42.1238808Z  * [new tag]                 viable/strict/1759346540    -> viable/strict/1759346540
2025-12-04T09:33:42.1239719Z  * [new tag]                 viable/strict/1759348181    -> viable/strict/1759348181
2025-12-04T09:33:42.1240893Z  * [new tag]                 viable/strict/1759350324    -> viable/strict/1759350324
2025-12-04T09:33:42.1241815Z  * [new tag]                 viable/strict/1759351793    -> viable/strict/1759351793
2025-12-04T09:33:42.1243018Z  * [new tag]                 viable/strict/1759353844    -> viable/strict/1759353844
2025-12-04T09:33:42.1244006Z  * [new tag]                 viable/strict/1759355374    -> viable/strict/1759355374
2025-12-04T09:33:42.1244956Z  * [new tag]                 viable/strict/1759357472    -> viable/strict/1759357472
2025-12-04T09:33:42.1246312Z  * [new tag]                 viable/strict/1759361002    -> viable/strict/1759361002
2025-12-04T09:33:42.1247154Z  * [new tag]                 viable/strict/1759362585    -> viable/strict/1759362585
2025-12-04T09:33:42.1248497Z  * [new tag]                 viable/strict/1759365359    -> viable/strict/1759365359
2025-12-04T09:33:42.1249568Z  * [new tag]                 viable/strict/1759370089    -> viable/strict/1759370089
2025-12-04T09:33:42.1251096Z  * [new tag]                 viable/strict/1759377554    -> viable/strict/1759377554
2025-12-04T09:33:42.1252225Z  * [new tag]                 viable/strict/1759379133    -> viable/strict/1759379133
2025-12-04T09:33:42.1253213Z  * [new tag]                 viable/strict/1759389871    -> viable/strict/1759389871
2025-12-04T09:33:42.1254301Z  * [new tag]                 viable/strict/1759393562    -> viable/strict/1759393562
2025-12-04T09:33:42.1255370Z  * [new tag]                 viable/strict/1759395076    -> viable/strict/1759395076
2025-12-04T09:33:42.1256479Z  * [new tag]                 viable/strict/1759398579    -> viable/strict/1759398579
2025-12-04T09:33:42.1257518Z  * [new tag]                 viable/strict/1759404142    -> viable/strict/1759404142
2025-12-04T09:33:42.1258522Z  * [new tag]                 viable/strict/1759405773    -> viable/strict/1759405773
2025-12-04T09:33:42.1259586Z  * [new tag]                 viable/strict/1759408041    -> viable/strict/1759408041
2025-12-04T09:33:42.1260631Z  * [new tag]                 viable/strict/1759411593    -> viable/strict/1759411593
2025-12-04T09:33:42.1261627Z  * [new tag]                 viable/strict/1759427395    -> viable/strict/1759427395
2025-12-04T09:33:42.1262719Z  * [new tag]                 viable/strict/1759434582    -> viable/strict/1759434582
2025-12-04T09:33:42.1263826Z  * [new tag]                 viable/strict/1759436720    -> viable/strict/1759436720
2025-12-04T09:33:42.1265005Z  * [new tag]                 viable/strict/1759440219    -> viable/strict/1759440219
2025-12-04T09:33:42.1265848Z  * [new tag]                 viable/strict/1759441948    -> viable/strict/1759441948
2025-12-04T09:33:42.1266994Z  * [new tag]                 viable/strict/1759443860    -> viable/strict/1759443860
2025-12-04T09:33:42.1268020Z  * [new tag]                 viable/strict/1759445377    -> viable/strict/1759445377
2025-12-04T09:33:42.1269119Z  * [new tag]                 viable/strict/1759447415    -> viable/strict/1759447415
2025-12-04T09:33:42.1270101Z  * [new tag]                 viable/strict/1759451750    -> viable/strict/1759451750
2025-12-04T09:33:42.1271223Z  * [new tag]                 viable/strict/1759453910    -> viable/strict/1759453910
2025-12-04T09:33:42.1272274Z  * [new tag]                 viable/strict/1759456483    -> viable/strict/1759456483
2025-12-04T09:33:42.1273350Z  * [new tag]                 viable/strict/1759459279    -> viable/strict/1759459279
2025-12-04T09:33:42.1274384Z  * [new tag]                 viable/strict/1759460742    -> viable/strict/1759460742
2025-12-04T09:33:42.1275585Z  * [new tag]                 viable/strict/1759462025    -> viable/strict/1759462025
2025-12-04T09:33:42.1276709Z  * [new tag]                 viable/strict/1759469086    -> viable/strict/1759469086
2025-12-04T09:33:42.1277672Z  * [new tag]                 viable/strict/1759470581    -> viable/strict/1759470581
2025-12-04T09:33:42.1278786Z  * [new tag]                 viable/strict/1759472786    -> viable/strict/1759472786
2025-12-04T09:33:42.1279773Z  * [new tag]                 viable/strict/1759476294    -> viable/strict/1759476294
2025-12-04T09:33:42.1280823Z  * [new tag]                 viable/strict/1759479963    -> viable/strict/1759479963
2025-12-04T09:33:42.1281855Z  * [new tag]                 viable/strict/1759492177    -> viable/strict/1759492177
2025-12-04T09:33:42.1282984Z  * [new tag]                 viable/strict/1759519278    -> viable/strict/1759519278
2025-12-04T09:33:42.1284013Z  * [new tag]                 viable/strict/1759524580    -> viable/strict/1759524580
2025-12-04T09:33:42.1285042Z  * [new tag]                 viable/strict/1759528193    -> viable/strict/1759528193
2025-12-04T09:33:42.1286355Z  * [new tag]                 viable/strict/1759533797    -> viable/strict/1759533797
2025-12-04T09:33:42.1287407Z  * [new tag]                 viable/strict/1759542780    -> viable/strict/1759542780
2025-12-04T09:33:42.1288454Z  * [new tag]                 viable/strict/1759549779    -> viable/strict/1759549779
2025-12-04T09:33:42.1289528Z  * [new tag]                 viable/strict/1759555455    -> viable/strict/1759555455
2025-12-04T09:33:42.1290559Z  * [new tag]                 viable/strict/1759559176    -> viable/strict/1759559176
2025-12-04T09:33:42.1291684Z  * [new tag]                 viable/strict/1759560629    -> viable/strict/1759560629
2025-12-04T09:33:42.1292698Z  * [new tag]                 viable/strict/1759569848    -> viable/strict/1759569848
2025-12-04T09:33:42.1293942Z  * [new tag]                 viable/strict/1759571382    -> viable/strict/1759571382
2025-12-04T09:33:42.1294950Z  * [new tag]                 viable/strict/1759573474    -> viable/strict/1759573474
2025-12-04T09:33:42.1295941Z  * [new tag]                 viable/strict/1759618187    -> viable/strict/1759618187
2025-12-04T09:33:42.1297039Z  * [new tag]                 viable/strict/1759626742    -> viable/strict/1759626742
2025-12-04T09:33:42.1298120Z  * [new tag]                 viable/strict/1759632427    -> viable/strict/1759632427
2025-12-04T09:33:42.1299156Z  * [new tag]                 viable/strict/1759634971    -> viable/strict/1759634971
2025-12-04T09:33:42.1300234Z  * [new tag]                 viable/strict/1759661382    -> viable/strict/1759661382
2025-12-04T09:33:42.1301509Z  * [new tag]                 viable/strict/1759663294    -> viable/strict/1759663294
2025-12-04T09:33:42.1302371Z  * [new tag]                 viable/strict/1759708178    -> viable/strict/1759708178
2025-12-04T09:33:42.1303610Z  * [new tag]                 viable/strict/1759715695    -> viable/strict/1759715695
2025-12-04T09:33:42.1304463Z  * [new tag]                 viable/strict/1759728293    -> viable/strict/1759728293
2025-12-04T09:33:42.1305653Z  * [new tag]                 viable/strict/1759735513    -> viable/strict/1759735513
2025-12-04T09:33:42.1306771Z  * [new tag]                 viable/strict/1759739177    -> viable/strict/1759739177
2025-12-04T09:33:42.1307813Z  * [new tag]                 viable/strict/1759758635    -> viable/strict/1759758635
2025-12-04T09:33:42.1308838Z  * [new tag]                 viable/strict/1759765784    -> viable/strict/1759765784
2025-12-04T09:33:42.1309999Z  * [new tag]                 viable/strict/1759767948    -> viable/strict/1759767948
2025-12-04T09:33:42.1311062Z  * [new tag]                 viable/strict/1759771461    -> viable/strict/1759771461
2025-12-04T09:33:42.1311894Z  * [new tag]                 viable/strict/1759776706    -> viable/strict/1759776706
2025-12-04T09:33:42.1313058Z  * [new tag]                 viable/strict/1759782317    -> viable/strict/1759782317
2025-12-04T09:33:42.1314199Z  * [new tag]                 viable/strict/1759783777    -> viable/strict/1759783777
2025-12-04T09:33:42.1315272Z  * [new tag]                 viable/strict/1759785815    -> viable/strict/1759785815
2025-12-04T09:33:42.1316393Z  * [new tag]                 viable/strict/1759789459    -> viable/strict/1759789459
2025-12-04T09:33:42.1317476Z  * [new tag]                 viable/strict/1759790974    -> viable/strict/1759790974
2025-12-04T09:33:42.1318325Z  * [new tag]                 viable/strict/1759794583    -> viable/strict/1759794583
2025-12-04T09:33:42.1319890Z  * [new tag]                 viable/strict/1759797408    -> viable/strict/1759797408
2025-12-04T09:33:42.1320961Z  * [new tag]                 viable/strict/1759799518    -> viable/strict/1759799518
2025-12-04T09:33:42.1322018Z  * [new tag]                 viable/strict/1759804909    -> viable/strict/1759804909
2025-12-04T09:33:42.1323176Z  * [new tag]                 viable/strict/1759807643    -> viable/strict/1759807643
2025-12-04T09:33:42.1324271Z  * [new tag]                 viable/strict/1759809089    -> viable/strict/1759809089
2025-12-04T09:33:42.1325305Z  * [new tag]                 viable/strict/1759811145    -> viable/strict/1759811145
2025-12-04T09:33:42.1326354Z  * [new tag]                 viable/strict/1759812581    -> viable/strict/1759812581
2025-12-04T09:33:42.1327417Z  * [new tag]                 viable/strict/1759814683    -> viable/strict/1759814683
2025-12-04T09:33:42.1328487Z  * [new tag]                 viable/strict/1759821889    -> viable/strict/1759821889
2025-12-04T09:33:42.1329604Z  * [new tag]                 viable/strict/1759823376    -> viable/strict/1759823376
2025-12-04T09:33:42.1330619Z  * [new tag]                 viable/strict/1759827107    -> viable/strict/1759827107
2025-12-04T09:33:42.1331639Z  * [new tag]                 viable/strict/1759830577    -> viable/strict/1759830577
2025-12-04T09:33:42.1332865Z  * [new tag]                 viable/strict/1759832720    -> viable/strict/1759832720
2025-12-04T09:33:42.1333705Z  * [new tag]                 viable/strict/1759842063    -> viable/strict/1759842063
2025-12-04T09:33:42.1334849Z  * [new tag]                 viable/strict/1759847121    -> viable/strict/1759847121
2025-12-04T09:33:42.1336233Z  * [new tag]                 viable/strict/1759850721    -> viable/strict/1759850721
2025-12-04T09:33:42.1337284Z  * [new tag]                 viable/strict/1759857870    -> viable/strict/1759857870
2025-12-04T09:33:42.1338377Z  * [new tag]                 viable/strict/1759863143    -> viable/strict/1759863143
2025-12-04T09:33:42.1339397Z  * [new tag]                 viable/strict/1759875874    -> viable/strict/1759875874
2025-12-04T09:33:42.1340233Z  * [new tag]                 viable/strict/1759877385    -> viable/strict/1759877385
2025-12-04T09:33:42.1341357Z  * [new tag]                 viable/strict/1759883801    -> viable/strict/1759883801
2025-12-04T09:33:42.1342473Z  * [new tag]                 viable/strict/1759885922    -> viable/strict/1759885922
2025-12-04T09:33:42.1343464Z  * [new tag]                 viable/strict/1759888488    -> viable/strict/1759888488
2025-12-04T09:33:42.1344569Z  * [new tag]                 viable/strict/1759895471    -> viable/strict/1759895471
2025-12-04T09:33:42.1345668Z  * [new tag]                 viable/strict/1759904803    -> viable/strict/1759904803
2025-12-04T09:33:42.1346894Z  * [new tag]                 viable/strict/1759908300    -> viable/strict/1759908300
2025-12-04T09:33:42.1347989Z  * [new tag]                 viable/strict/1759915520    -> viable/strict/1759915520
2025-12-04T09:33:42.1349028Z  * [new tag]                 viable/strict/1759916978    -> viable/strict/1759916978
2025-12-04T09:33:42.1349860Z  * [new tag]                 viable/strict/1759930024    -> viable/strict/1759930024
2025-12-04T09:33:42.1350975Z  * [new tag]                 viable/strict/1759948122    -> viable/strict/1759948122
2025-12-04T09:33:42.1352158Z  * [new tag]                 viable/strict/1759952983    -> viable/strict/1759952983
2025-12-04T09:33:42.1353264Z  * [new tag]                 viable/strict/1759955121    -> viable/strict/1759955121
2025-12-04T09:33:42.1354282Z  * [new tag]                 viable/strict/1759962298    -> viable/strict/1759962298
2025-12-04T09:33:42.1355229Z  * [new tag]                 viable/strict/1759965837    -> viable/strict/1759965837
2025-12-04T09:33:42.1356418Z  * [new tag]                 viable/strict/1759970213    -> viable/strict/1759970213
2025-12-04T09:33:42.1357489Z  * [new tag]                 viable/strict/1759974894    -> viable/strict/1759974894
2025-12-04T09:33:42.1358499Z  * [new tag]                 viable/strict/1759977763    -> viable/strict/1759977763
2025-12-04T09:33:42.1359586Z  * [new tag]                 viable/strict/1759979241    -> viable/strict/1759979241
2025-12-04T09:33:42.1360649Z  * [new tag]                 viable/strict/1759985417    -> viable/strict/1759985417
2025-12-04T09:33:42.1361680Z  * [new tag]                 viable/strict/1759987490    -> viable/strict/1759987490
2025-12-04T09:33:42.1363010Z  * [new tag]                 viable/strict/1759996180    -> viable/strict/1759996180
2025-12-04T09:33:42.1364039Z  * [new tag]                 viable/strict/1760065682    -> viable/strict/1760065682
2025-12-04T09:33:42.1365119Z  * [new tag]                 viable/strict/1760066894    -> viable/strict/1760066894
2025-12-04T09:33:42.1366184Z  * [new tag]                 viable/strict/1760070345    -> viable/strict/1760070345
2025-12-04T09:33:42.1367244Z  * [new tag]                 viable/strict/1760089782    -> viable/strict/1760089782
2025-12-04T09:33:42.1368334Z  * [new tag]                 viable/strict/1760091921    -> viable/strict/1760091921
2025-12-04T09:33:42.1369368Z  * [new tag]                 viable/strict/1760127924    -> viable/strict/1760127924
2025-12-04T09:33:42.1370483Z  * [new tag]                 viable/strict/1760129489    -> viable/strict/1760129489
2025-12-04T09:33:42.1371617Z  * [new tag]                 viable/strict/1760132980    -> viable/strict/1760132980
2025-12-04T09:33:42.1372984Z  * [new tag]                 viable/strict/1760135060    -> viable/strict/1760135060
2025-12-04T09:33:42.1374081Z  * [new tag]                 viable/strict/1760215782    -> viable/strict/1760215782
2025-12-04T09:33:42.1375167Z  * [new tag]                 viable/strict/1760273849    -> viable/strict/1760273849
2025-12-04T09:33:42.1376223Z  * [new tag]                 viable/strict/1760275517    -> viable/strict/1760275517
2025-12-04T09:33:42.1377303Z  * [new tag]                 viable/strict/1760276979    -> viable/strict/1760276979
2025-12-04T09:33:42.1378396Z  * [new tag]                 viable/strict/1760279007    -> viable/strict/1760279007
2025-12-04T09:33:42.1379405Z  * [new tag]                 viable/strict/1760286328    -> viable/strict/1760286328
2025-12-04T09:33:42.1380219Z  * [new tag]                 viable/strict/1760493304    -> viable/strict/1760493304
2025-12-04T09:33:42.1381433Z  * [new tag]                 viable/strict/1760496298    -> viable/strict/1760496298
2025-12-04T09:33:42.1382235Z  * [new tag]                 viable/strict/1760518396    -> viable/strict/1760518396
2025-12-04T09:33:42.1383431Z  * [new tag]                 viable/strict/1760534864    -> viable/strict/1760534864
2025-12-04T09:33:42.1384462Z  * [new tag]                 viable/strict/1760549062    -> viable/strict/1760549062
2025-12-04T09:33:42.1385669Z  * [new tag]                 viable/strict/1760552799    -> viable/strict/1760552799
2025-12-04T09:33:42.1386739Z  * [new tag]                 viable/strict/1760554355    -> viable/strict/1760554355
2025-12-04T09:33:42.1387823Z  * [new tag]                 viable/strict/1760556275    -> viable/strict/1760556275
2025-12-04T09:33:42.1389329Z  * [new tag]                 viable/strict/1760564979    -> viable/strict/1760564979
2025-12-04T09:33:42.1390494Z  * [new tag]                 viable/strict/1760567049    -> viable/strict/1760567049
2025-12-04T09:33:42.1392024Z  * [new tag]                 viable/strict/1760568585    -> viable/strict/1760568585
2025-12-04T09:33:42.1393068Z  * [new tag]                 viable/strict/1760570630    -> viable/strict/1760570630
2025-12-04T09:33:42.1394097Z  * [new tag]                 viable/strict/1760572180    -> viable/strict/1760572180
2025-12-04T09:33:42.1395207Z  * [new tag]                 viable/strict/1760575094    -> viable/strict/1760575094
2025-12-04T09:33:42.1396353Z  * [new tag]                 viable/strict/1760579709    -> viable/strict/1760579709
2025-12-04T09:33:42.1398004Z  * [new tag]                 viable/strict/1760582614    -> viable/strict/1760582614
2025-12-04T09:33:42.1399119Z  * [new tag]                 viable/strict/1760586815    -> viable/strict/1760586815
2025-12-04T09:33:42.1399970Z  * [new tag]                 viable/strict/1760588829    -> viable/strict/1760588829
2025-12-04T09:33:42.1401184Z  * [new tag]                 viable/strict/1760590200    -> viable/strict/1760590200
2025-12-04T09:33:42.1402488Z  * [new tag]                 viable/strict/1760592311    -> viable/strict/1760592311
2025-12-04T09:33:42.1403524Z  * [new tag]                 viable/strict/1760619733    -> viable/strict/1760619733
2025-12-04T09:33:42.1404344Z  * [new tag]                 viable/strict/1760628335    -> viable/strict/1760628335
2025-12-04T09:33:42.1405461Z  * [new tag]                 viable/strict/1760635490    -> viable/strict/1760635490
2025-12-04T09:33:42.1406521Z  * [new tag]                 viable/strict/1760640743    -> viable/strict/1760640743
2025-12-04T09:33:42.1407527Z  * [new tag]                 viable/strict/1760642528    -> viable/strict/1760642528
2025-12-04T09:33:42.1408589Z  * [new tag]                 viable/strict/1760646330    -> viable/strict/1760646330
2025-12-04T09:33:42.1409614Z  * [new tag]                 viable/strict/1760666101    -> viable/strict/1760666101
2025-12-04T09:33:42.1410752Z  * [new tag]                 viable/strict/1760668990    -> viable/strict/1760668990
2025-12-04T09:33:42.1411762Z  * [new tag]                 viable/strict/1760670600    -> viable/strict/1760670600
2025-12-04T09:33:42.1412824Z  * [new tag]                 viable/strict/1760671704    -> viable/strict/1760671704
2025-12-04T09:33:42.1413846Z  * [new tag]                 viable/strict/1760673121    -> viable/strict/1760673121
2025-12-04T09:33:42.1415015Z  * [new tag]                 viable/strict/1760675352    -> viable/strict/1760675352
2025-12-04T09:33:42.1416094Z  * [new tag]                 viable/strict/1760696731    -> viable/strict/1760696731
2025-12-04T09:33:42.1418730Z  * [new tag]                 viable/strict/1760723515    -> viable/strict/1760723515
2025-12-04T09:33:42.1419791Z  * [new tag]                 viable/strict/1760727234    -> viable/strict/1760727234
2025-12-04T09:33:42.1420879Z  * [new tag]                 viable/strict/1760730578    -> viable/strict/1760730578
2025-12-04T09:33:42.1422035Z  * [new tag]                 viable/strict/1760732726    -> viable/strict/1760732726
2025-12-04T09:33:42.1423212Z  * [new tag]                 viable/strict/1760734180    -> viable/strict/1760734180
2025-12-04T09:33:42.1424087Z  * [new tag]                 viable/strict/1760736251    -> viable/strict/1760736251
2025-12-04T09:33:42.1425298Z  * [new tag]                 viable/strict/1760737772    -> viable/strict/1760737772
2025-12-04T09:33:42.1426336Z  * [new tag]                 viable/strict/1760758005    -> viable/strict/1760758005
2025-12-04T09:33:42.1427406Z  * [new tag]                 viable/strict/1760761532    -> viable/strict/1760761532
2025-12-04T09:33:42.1428508Z  * [new tag]                 viable/strict/1760802581    -> viable/strict/1760802581
2025-12-04T09:33:42.1429538Z  * [new tag]                 viable/strict/1760827772    -> viable/strict/1760827772
2025-12-04T09:33:42.1430576Z  * [new tag]                 viable/strict/1760834524    -> viable/strict/1760834524
2025-12-04T09:33:42.1431677Z  * [new tag]                 viable/strict/1760845009    -> viable/strict/1760845009
2025-12-04T09:33:42.1432759Z  * [new tag]                 viable/strict/1760876836    -> viable/strict/1760876836
2025-12-04T09:33:42.1433834Z  * [new tag]                 viable/strict/1760880329    -> viable/strict/1760880329
2025-12-04T09:33:42.1434874Z  * [new tag]                 viable/strict/1760888987    -> viable/strict/1760888987
2025-12-04T09:33:42.1435779Z  * [new tag]                 viable/strict/1760912664    -> viable/strict/1760912664
2025-12-04T09:33:42.1436975Z  * [new tag]                 viable/strict/1760925321    -> viable/strict/1760925321
2025-12-04T09:33:42.1438006Z  * [new tag]                 viable/strict/1760931488    -> viable/strict/1760931488
2025-12-04T09:33:42.1439078Z  * [new tag]                 viable/strict/1760932693    -> viable/strict/1760932693
2025-12-04T09:33:42.1440147Z  * [new tag]                 viable/strict/1761004184    -> viable/strict/1761004184
2025-12-04T09:33:42.1441185Z  * [new tag]                 viable/strict/1761014748    -> viable/strict/1761014748
2025-12-04T09:33:42.1442336Z  * [new tag]                 viable/strict/1761017491    -> viable/strict/1761017491
2025-12-04T09:33:42.1443433Z  * [new tag]                 viable/strict/1761018806    -> viable/strict/1761018806
2025-12-04T09:33:42.1444608Z  * [new tag]                 viable/strict/1761020754    -> viable/strict/1761020754
2025-12-04T09:33:42.1445601Z  * [new tag]                 viable/strict/1761024303    -> viable/strict/1761024303
2025-12-04T09:33:42.1446649Z  * [new tag]                 viable/strict/1761029582    -> viable/strict/1761029582
2025-12-04T09:33:42.1447694Z  * [new tag]                 viable/strict/1761031535    -> viable/strict/1761031535
2025-12-04T09:33:42.1448683Z  * [new tag]                 viable/strict/1761035196    -> viable/strict/1761035196
2025-12-04T09:33:42.1449933Z  * [new tag]                 viable/strict/1761045825    -> viable/strict/1761045825
2025-12-04T09:33:42.1451145Z  * [new tag]                 viable/strict/1761054796    -> viable/strict/1761054796
2025-12-04T09:33:42.1452241Z  * [new tag]                 viable/strict/1761060314    -> viable/strict/1761060314
2025-12-04T09:33:42.1453297Z  * [new tag]                 viable/strict/1761071198    -> viable/strict/1761071198
2025-12-04T09:33:42.1454431Z  * [new tag]                 viable/strict/1761074628    -> viable/strict/1761074628
2025-12-04T09:33:42.1455530Z  * [new tag]                 viable/strict/1761078351    -> viable/strict/1761078351
2025-12-04T09:33:42.1456516Z  * [new tag]                 viable/strict/1761079822    -> viable/strict/1761079822
2025-12-04T09:33:42.1457573Z  * [new tag]                 viable/strict/1761081873    -> viable/strict/1761081873
2025-12-04T09:33:42.1458652Z  * [new tag]                 viable/strict/1761083392    -> viable/strict/1761083392
2025-12-04T09:33:42.1459723Z  * [new tag]                 viable/strict/1761085465    -> viable/strict/1761085465
2025-12-04T09:33:42.1461308Z  * [new tag]                 viable/strict/1761089099    -> viable/strict/1761089099
2025-12-04T09:33:42.1462435Z  * [new tag]                 viable/strict/1761095535    -> viable/strict/1761095535
2025-12-04T09:33:42.1463422Z  * [new tag]                 viable/strict/1761098119    -> viable/strict/1761098119
2025-12-04T09:33:42.1465006Z  * [new tag]                 viable/strict/1761101330    -> viable/strict/1761101330
2025-12-04T09:33:42.1466098Z  * [new tag]                 viable/strict/1761114425    -> viable/strict/1761114425
2025-12-04T09:33:42.1467150Z  * [new tag]                 viable/strict/1761116036    -> viable/strict/1761116036
2025-12-04T09:33:42.1468226Z  * [new tag]                 viable/strict/1761119379    -> viable/strict/1761119379
2025-12-04T09:33:42.1469306Z  * [new tag]                 viable/strict/1761121601    -> viable/strict/1761121601
2025-12-04T09:33:42.1470286Z  * [new tag]                 viable/strict/1761123234    -> viable/strict/1761123234
2025-12-04T09:33:42.1471352Z  * [new tag]                 viable/strict/1761126621    -> viable/strict/1761126621
2025-12-04T09:33:42.1472424Z  * [new tag]                 viable/strict/1761132259    -> viable/strict/1761132259
2025-12-04T09:33:42.1473518Z  * [new tag]                 viable/strict/1761146746    -> viable/strict/1761146746
2025-12-04T09:33:42.1474560Z  * [new tag]                 viable/strict/1761164752    -> viable/strict/1761164752
2025-12-04T09:33:42.1475689Z  * [new tag]                 viable/strict/1761166198    -> viable/strict/1761166198
2025-12-04T09:33:42.1476843Z  * [new tag]                 viable/strict/1761175424    -> viable/strict/1761175424
2025-12-04T09:33:42.1477856Z  * [new tag]                 viable/strict/1761176983    -> viable/strict/1761176983
2025-12-04T09:33:42.1479101Z  * [new tag]                 viable/strict/1761179891    -> viable/strict/1761179891
2025-12-04T09:33:42.1480233Z  * [new tag]                 viable/strict/1761181930    -> viable/strict/1761181930
2025-12-04T09:33:42.1481298Z  * [new tag]                 viable/strict/1761184516    -> viable/strict/1761184516
2025-12-04T09:33:42.1482440Z  * [new tag]                 viable/strict/1761190179    -> viable/strict/1761190179
2025-12-04T09:33:42.1483587Z  * [new tag]                 viable/strict/1761193558    -> viable/strict/1761193558
2025-12-04T09:33:42.1484644Z  * [new tag]                 viable/strict/1761207990    -> viable/strict/1761207990
2025-12-04T09:33:42.1485774Z  * [new tag]                 viable/strict/1761229539    -> viable/strict/1761229539
2025-12-04T09:33:42.1487155Z  * [new tag]                 viable/strict/1761244031    -> viable/strict/1761244031
2025-12-04T09:33:42.1488250Z  * [new tag]                 viable/strict/1761248986    -> viable/strict/1761248986
2025-12-04T09:33:42.1489324Z  * [new tag]                 viable/strict/1761259791    -> viable/strict/1761259791
2025-12-04T09:33:42.1490380Z  * [new tag]                 viable/strict/1761266139    -> viable/strict/1761266139
2025-12-04T09:33:42.1491471Z  * [new tag]                 viable/strict/1761268316    -> viable/strict/1761268316
2025-12-04T09:33:42.1492517Z  * [new tag]                 viable/strict/1761273805    -> viable/strict/1761273805
2025-12-04T09:33:42.1493543Z  * [new tag]                 viable/strict/1761275261    -> viable/strict/1761275261
2025-12-04T09:33:42.1494684Z  * [new tag]                 viable/strict/1761277913    -> viable/strict/1761277913
2025-12-04T09:33:42.1495791Z  * [new tag]                 viable/strict/1761290701    -> viable/strict/1761290701
2025-12-04T09:33:42.1496938Z  * [new tag]                 viable/strict/1761294396    -> viable/strict/1761294396
2025-12-04T09:33:42.1498128Z  * [new tag]                 viable/strict/1761303047    -> viable/strict/1761303047
2025-12-04T09:33:42.1499215Z  * [new tag]                 viable/strict/1761335388    -> viable/strict/1761335388
2025-12-04T09:33:42.1500294Z  * [new tag]                 viable/strict/1761337551    -> viable/strict/1761337551
2025-12-04T09:33:42.1501580Z  * [new tag]                 viable/strict/1761339007    -> viable/strict/1761339007
2025-12-04T09:33:42.1502580Z  * [new tag]                 viable/strict/1761341050    -> viable/strict/1761341050
2025-12-04T09:33:42.1503680Z  * [new tag]                 viable/strict/1761346188    -> viable/strict/1761346188
2025-12-04T09:33:42.1504876Z  * [new tag]                 viable/strict/1761349792    -> viable/strict/1761349792
2025-12-04T09:33:42.1505937Z  * [new tag]                 viable/strict/1761352620    -> viable/strict/1761352620
2025-12-04T09:33:42.1506976Z  * [new tag]                 viable/strict/1761354730    -> viable/strict/1761354730
2025-12-04T09:33:42.1508087Z  * [new tag]                 viable/strict/1761357298    -> viable/strict/1761357298
2025-12-04T09:33:42.1509177Z  * [new tag]                 viable/strict/1761360201    -> viable/strict/1761360201
2025-12-04T09:33:42.1510278Z  * [new tag]                 viable/strict/1761361753    -> viable/strict/1761361753
2025-12-04T09:33:42.1511328Z  * [new tag]                 viable/strict/1761364351    -> viable/strict/1761364351
2025-12-04T09:33:42.1512393Z  * [new tag]                 viable/strict/1761366338    -> viable/strict/1761366338
2025-12-04T09:33:42.1513657Z  * [new tag]                 viable/strict/1761367802    -> viable/strict/1761367802
2025-12-04T09:33:42.1514718Z  * [new tag]                 viable/strict/1761369889    -> viable/strict/1761369889
2025-12-04T09:33:42.1515866Z  * [new tag]                 viable/strict/1761371385    -> viable/strict/1761371385
2025-12-04T09:33:42.1516940Z  * [new tag]                 viable/strict/1761373581    -> viable/strict/1761373581
2025-12-04T09:33:42.1518145Z  * [new tag]                 viable/strict/1761375054    -> viable/strict/1761375054
2025-12-04T09:33:42.1519264Z  * [new tag]                 viable/strict/1761421785    -> viable/strict/1761421785
2025-12-04T09:33:42.1520471Z  * [new tag]                 viable/strict/1761434614    -> viable/strict/1761434614
2025-12-04T09:33:42.1521954Z  * [new tag]                 viable/strict/1761439254    -> viable/strict/1761439254
2025-12-04T09:33:42.1523281Z  * [new tag]                 viable/strict/1761454187    -> viable/strict/1761454187
2025-12-04T09:33:42.1524523Z  * [new tag]                 viable/strict/1761459991    -> viable/strict/1761459991
2025-12-04T09:33:42.1525785Z  * [new tag]                 viable/strict/1761470668    -> viable/strict/1761470668
2025-12-04T09:33:42.1527301Z  * [new tag]                 viable/strict/1761472188    -> viable/strict/1761472188
2025-12-04T09:33:42.1528456Z  * [new tag]                 viable/strict/1761503178    -> viable/strict/1761503178
2025-12-04T09:33:42.1529534Z  * [new tag]                 viable/strict/1761517492    -> viable/strict/1761517492
2025-12-04T09:33:42.1530618Z  * [new tag]                 viable/strict/1761518981    -> viable/strict/1761518981
2025-12-04T09:33:42.1531738Z  * [new tag]                 viable/strict/1761533609    -> viable/strict/1761533609
2025-12-04T09:33:42.1532626Z  * [new tag]                 viable/strict/1761546438    -> viable/strict/1761546438
2025-12-04T09:33:42.1534311Z  * [new tag]                 viable/strict/1761548133    -> viable/strict/1761548133
2025-12-04T09:33:42.1535731Z  * [new tag]                 viable/strict/1761555186    -> viable/strict/1761555186
2025-12-04T09:33:42.1536871Z  * [new tag]                 viable/strict/1761557178    -> viable/strict/1761557178
2025-12-04T09:33:42.1537932Z  * [new tag]                 viable/strict/1761560772    -> viable/strict/1761560772
2025-12-04T09:33:42.1539023Z  * [new tag]                 viable/strict/1761562266    -> viable/strict/1761562266
2025-12-04T09:33:42.1540208Z  * [new tag]                 viable/strict/1761564260    -> viable/strict/1761564260
2025-12-04T09:33:42.1541240Z  * [new tag]                 viable/strict/1761568072    -> viable/strict/1761568072
2025-12-04T09:33:42.1542286Z  * [new tag]                 viable/strict/1761571683    -> viable/strict/1761571683
2025-12-04T09:33:42.1543197Z  * [new tag]                 viable/strict/1761580199    -> viable/strict/1761580199
2025-12-04T09:33:42.1544310Z  * [new tag]                 viable/strict/1761587383    -> viable/strict/1761587383
2025-12-04T09:33:42.1545418Z  * [new tag]                 viable/strict/1761591165    -> viable/strict/1761591165
2025-12-04T09:33:42.1546484Z  * [new tag]                 viable/strict/1761594575    -> viable/strict/1761594575
2025-12-04T09:33:42.1547558Z  * [new tag]                 viable/strict/1761596710    -> viable/strict/1761596710
2025-12-04T09:33:42.1548724Z  * [new tag]                 viable/strict/1761598189    -> viable/strict/1761598189
2025-12-04T09:33:42.1549753Z  * [new tag]                 viable/strict/1761600254    -> viable/strict/1761600254
2025-12-04T09:33:42.1550823Z  * [new tag]                 viable/strict/1761603879    -> viable/strict/1761603879
2025-12-04T09:33:42.1551936Z  * [new tag]                 viable/strict/1761605429    -> viable/strict/1761605429
2025-12-04T09:33:42.1553131Z  * [new tag]                 viable/strict/1761607468    -> viable/strict/1761607468
2025-12-04T09:33:42.1554239Z  * [new tag]                 viable/strict/1761608983    -> viable/strict/1761608983
2025-12-04T09:33:42.1555329Z  * [new tag]                 viable/strict/1761611846    -> viable/strict/1761611846
2025-12-04T09:33:42.1556472Z  * [new tag]                 viable/strict/1761613922    -> viable/strict/1761613922
2025-12-04T09:33:42.1557315Z  * [new tag]                 viable/strict/1761616504    -> viable/strict/1761616504
2025-12-04T09:33:42.1558303Z  * [new tag]                 viable/strict/1761619599    -> viable/strict/1761619599
2025-12-04T09:33:42.1559509Z  * [new tag]                 viable/strict/1761686693    -> viable/strict/1761686693
2025-12-04T09:33:42.1560579Z  * [new tag]                 viable/strict/1761688179    -> viable/strict/1761688179
2025-12-04T09:33:42.1561611Z  * [new tag]                 viable/strict/1761691973    -> viable/strict/1761691973
2025-12-04T09:33:42.1562978Z  * [new tag]                 viable/strict/1761693884    -> viable/strict/1761693884
2025-12-04T09:33:42.1564066Z  * [new tag]                 viable/strict/1761695389    -> viable/strict/1761695389
2025-12-04T09:33:42.1565148Z  * [new tag]                 viable/strict/1761698408    -> viable/strict/1761698408
2025-12-04T09:33:42.1566216Z  * [new tag]                 viable/strict/1761702931    -> viable/strict/1761702931
2025-12-04T09:33:42.1567313Z  * [new tag]                 viable/strict/1761706307    -> viable/strict/1761706307
2025-12-04T09:33:42.1568479Z  * [new tag]                 viable/strict/1761709065    -> viable/strict/1761709065
2025-12-04T09:33:42.1569667Z  * [new tag]                 viable/strict/1761710285    -> viable/strict/1761710285
2025-12-04T09:33:42.1570828Z  * [new tag]                 viable/strict/1761711983    -> viable/strict/1761711983
2025-12-04T09:33:42.1571955Z  * [new tag]                 viable/strict/1761713514    -> viable/strict/1761713514
2025-12-04T09:33:42.1573201Z  * [new tag]                 viable/strict/1761715523    -> viable/strict/1761715523
2025-12-04T09:33:42.1574311Z  * [new tag]                 viable/strict/1761727973    -> viable/strict/1761727973
2025-12-04T09:33:42.1575464Z  * [new tag]                 viable/strict/1761751558    -> viable/strict/1761751558
2025-12-04T09:33:42.1576556Z  * [new tag]                 viable/strict/1761755187    -> viable/strict/1761755187
2025-12-04T09:33:42.1577745Z  * [new tag]                 viable/strict/1761756826    -> viable/strict/1761756826
2025-12-04T09:33:42.1578860Z  * [new tag]                 viable/strict/1761769551    -> viable/strict/1761769551
2025-12-04T09:33:42.1580060Z  * [new tag]                 viable/strict/1761771032    -> viable/strict/1761771032
2025-12-04T09:33:42.1581063Z  * [new tag]                 viable/strict/1761773101    -> viable/strict/1761773101
2025-12-04T09:33:42.1582141Z  * [new tag]                 viable/strict/1761781792    -> viable/strict/1761781792
2025-12-04T09:33:42.1583417Z  * [new tag]                 viable/strict/1761784788    -> viable/strict/1761784788
2025-12-04T09:33:42.1584403Z  * [new tag]                 viable/strict/1761786740    -> viable/strict/1761786740
2025-12-04T09:33:42.1585620Z  * [new tag]                 viable/strict/1761789332    -> viable/strict/1761789332
2025-12-04T09:33:42.1587273Z  * [new tag]                 viable/strict/1761792569    -> viable/strict/1761792569
2025-12-04T09:33:42.1588422Z  * [new tag]                 viable/strict/1761795289    -> viable/strict/1761795289
2025-12-04T09:33:42.1589514Z  * [new tag]                 viable/strict/1761798345    -> viable/strict/1761798345
2025-12-04T09:33:42.1590587Z  * [new tag]                 viable/strict/1761799827    -> viable/strict/1761799827
2025-12-04T09:33:42.1591776Z  * [new tag]                 viable/strict/1761805604    -> viable/strict/1761805604
2025-12-04T09:33:42.1592897Z  * [new tag]                 viable/strict/1761807202    -> viable/strict/1761807202
2025-12-04T09:33:42.1593996Z  * [new tag]                 viable/strict/1761809094    -> viable/strict/1761809094
2025-12-04T09:33:42.1595124Z  * [new tag]                 viable/strict/1761810576    -> viable/strict/1761810576
2025-12-04T09:33:42.1596423Z  * [new tag]                 viable/strict/1761812771    -> viable/strict/1761812771
2025-12-04T09:33:42.1597562Z  * [new tag]                 viable/strict/1761814363    -> viable/strict/1761814363
2025-12-04T09:33:42.1598613Z  * [new tag]                 viable/strict/1761857410    -> viable/strict/1761857410
2025-12-04T09:33:42.1599735Z  * [new tag]                 viable/strict/1761860985    -> viable/strict/1761860985
2025-12-04T09:33:42.1600944Z  * [new tag]                 viable/strict/1761863094    -> viable/strict/1761863094
2025-12-04T09:33:42.1605848Z  * [new tag]                 viable/strict/1761864590    -> viable/strict/1761864590
2025-12-04T09:33:42.1607010Z  * [new tag]                 viable/strict/1761866675    -> viable/strict/1761866675
2025-12-04T09:33:42.1608380Z  * [new tag]                 viable/strict/1761868178    -> viable/strict/1761868178
2025-12-04T09:33:42.1609570Z  * [new tag]                 viable/strict/1761871111    -> viable/strict/1761871111
2025-12-04T09:33:42.1611164Z  * [new tag]                 viable/strict/1761873126    -> viable/strict/1761873126
2025-12-04T09:33:42.1612279Z  * [new tag]                 viable/strict/1761875714    -> viable/strict/1761875714
2025-12-04T09:33:42.1613464Z  * [new tag]                 viable/strict/1761878924    -> viable/strict/1761878924
2025-12-04T09:33:42.1614625Z  * [new tag]                 viable/strict/1761881727    -> viable/strict/1761881727
2025-12-04T09:33:42.1615870Z  * [new tag]                 viable/strict/1761882959    -> viable/strict/1761882959
2025-12-04T09:33:42.1616864Z  * [new tag]                 viable/strict/1761886268    -> viable/strict/1761886268
2025-12-04T09:33:42.1617986Z  * [new tag]                 viable/strict/1761893641    -> viable/strict/1761893641
2025-12-04T09:33:42.1619143Z  * [new tag]                 viable/strict/1761931517    -> viable/strict/1761931517
2025-12-04T09:33:42.1620291Z  * [new tag]                 viable/strict/1761933080    -> viable/strict/1761933080
2025-12-04T09:33:42.1621379Z  * [new tag]                 viable/strict/1761935217    -> viable/strict/1761935217
2025-12-04T09:33:42.1622537Z  * [new tag]                 viable/strict/1761938533    -> viable/strict/1761938533
2025-12-04T09:33:42.1623708Z  * [new tag]                 viable/strict/1761940184    -> viable/strict/1761940184
2025-12-04T09:33:42.1624797Z  * [new tag]                 viable/strict/1761942338    -> viable/strict/1761942338
2025-12-04T09:33:42.1628236Z  * [new tag]                 viable/strict/1761946100    -> viable/strict/1761946100
2025-12-04T09:33:42.1628492Z  * [new tag]                 viable/strict/1761947374    -> viable/strict/1761947374
2025-12-04T09:33:42.1628789Z  * [new tag]                 viable/strict/1761950978    -> viable/strict/1761950978
2025-12-04T09:33:42.1630214Z  * [new tag]                 viable/strict/1761957727    -> viable/strict/1761957727
2025-12-04T09:33:42.1630455Z  * [new tag]                 viable/strict/1761959532    -> viable/strict/1761959532
2025-12-04T09:33:42.1631541Z  * [new tag]                 viable/strict/1761965366    -> viable/strict/1761965366
2025-12-04T09:33:42.1632811Z  * [new tag]                 viable/strict/1761968066    -> viable/strict/1761968066
2025-12-04T09:33:42.1633968Z  * [new tag]                 viable/strict/1761969322    -> viable/strict/1761969322
2025-12-04T09:33:42.1635114Z  * [new tag]                 viable/strict/1761974723    -> viable/strict/1761974723
2025-12-04T09:33:42.1636205Z  * [new tag]                 viable/strict/1761981837    -> viable/strict/1761981837
2025-12-04T09:33:42.1637462Z  * [new tag]                 viable/strict/1761985546    -> viable/strict/1761985546
2025-12-04T09:33:42.1638632Z  * [new tag]                 viable/strict/1761987030    -> viable/strict/1761987030
2025-12-04T09:33:42.1639878Z  * [new tag]                 viable/strict/1762003554    -> viable/strict/1762003554
2025-12-04T09:33:42.1640875Z  * [new tag]                 viable/strict/1762021560    -> viable/strict/1762021560
2025-12-04T09:33:42.1642062Z  * [new tag]                 viable/strict/1762032190    -> viable/strict/1762032190
2025-12-04T09:33:42.1643343Z  * [new tag]                 viable/strict/1762040981    -> viable/strict/1762040981
2025-12-04T09:33:42.1644533Z  * [new tag]                 viable/strict/1762048525    -> viable/strict/1762048525
2025-12-04T09:33:42.1645658Z  * [new tag]                 viable/strict/1762104223    -> viable/strict/1762104223
2025-12-04T09:33:42.1646732Z  * [new tag]                 viable/strict/1762105778    -> viable/strict/1762105778
2025-12-04T09:33:42.1647929Z  * [new tag]                 viable/strict/1762115109    -> viable/strict/1762115109
2025-12-04T09:33:42.1648994Z  * [new tag]                 viable/strict/1762125840    -> viable/strict/1762125840
2025-12-04T09:33:42.1649913Z  * [new tag]                 viable/strict/1762127377    -> viable/strict/1762127377
2025-12-04T09:33:42.1651447Z  * [new tag]                 viable/strict/1762134925    -> viable/strict/1762134925
2025-12-04T09:33:42.1652626Z  * [new tag]                 viable/strict/1762138338    -> viable/strict/1762138338
2025-12-04T09:33:42.1653748Z  * [new tag]                 viable/strict/1762148993    -> viable/strict/1762148993
2025-12-04T09:33:42.1654865Z  * [new tag]                 viable/strict/1762152871    -> viable/strict/1762152871
2025-12-04T09:33:42.1656005Z  * [new tag]                 viable/strict/1762156183    -> viable/strict/1762156183
2025-12-04T09:33:42.1657124Z  * [new tag]                 viable/strict/1762163457    -> viable/strict/1762163457
2025-12-04T09:33:42.1658236Z  * [new tag]                 viable/strict/1762165569    -> viable/strict/1762165569
2025-12-04T09:33:42.1659356Z  * [new tag]                 viable/strict/1762169035    -> viable/strict/1762169035
2025-12-04T09:33:42.1660482Z  * [new tag]                 viable/strict/1762174936    -> viable/strict/1762174936
2025-12-04T09:33:42.1661617Z  * [new tag]                 viable/strict/1762194412    -> viable/strict/1762194412
2025-12-04T09:33:42.1662683Z  * [new tag]                 viable/strict/1762195876    -> viable/strict/1762195876
2025-12-04T09:33:42.1663827Z  * [new tag]                 viable/strict/1762197788    -> viable/strict/1762197788
2025-12-04T09:33:42.1664980Z  * [new tag]                 viable/strict/1762199389    -> viable/strict/1762199389
2025-12-04T09:33:42.1666331Z  * [new tag]                 viable/strict/1762206585    -> viable/strict/1762206585
2025-12-04T09:33:42.1667578Z  * [new tag]                 viable/strict/1762210184    -> viable/strict/1762210184
2025-12-04T09:33:42.1668496Z  * [new tag]                 viable/strict/1762218736    -> viable/strict/1762218736
2025-12-04T09:33:42.1669716Z  * [new tag]                 viable/strict/1762224529    -> viable/strict/1762224529
2025-12-04T09:33:42.1671010Z  * [new tag]                 viable/strict/1762227253    -> viable/strict/1762227253
2025-12-04T09:33:42.1671854Z  * [new tag]                 viable/strict/1762228515    -> viable/strict/1762228515
2025-12-04T09:33:42.1673097Z  * [new tag]                 viable/strict/1762230349    -> viable/strict/1762230349
2025-12-04T09:33:42.1674410Z  * [new tag]                 viable/strict/1762231859    -> viable/strict/1762231859
2025-12-04T09:33:42.1675653Z  * [new tag]                 viable/strict/1762233925    -> viable/strict/1762233925
2025-12-04T09:33:42.1676881Z  * [new tag]                 viable/strict/1762237630    -> viable/strict/1762237630
2025-12-04T09:33:42.1677848Z  * [new tag]                 viable/strict/1762253522    -> viable/strict/1762253522
2025-12-04T09:33:42.1679101Z  * [new tag]                 viable/strict/1762278588    -> viable/strict/1762278588
2025-12-04T09:33:42.1680315Z  * [new tag]                 viable/strict/1762284203    -> viable/strict/1762284203
2025-12-04T09:33:42.1681504Z  * [new tag]                 viable/strict/1762289446    -> viable/strict/1762289446
2025-12-04T09:33:42.1682572Z  * [new tag]                 viable/strict/1762291515    -> viable/strict/1762291515
2025-12-04T09:33:42.1683821Z  * [new tag]                 viable/strict/1762295100    -> viable/strict/1762295100
2025-12-04T09:33:42.1685317Z  * [new tag]                 viable/strict/1762296590    -> viable/strict/1762296590
2025-12-04T09:33:42.1686250Z  * [new tag]                 viable/strict/1762300179    -> viable/strict/1762300179
2025-12-04T09:33:42.1687207Z  * [new tag]                 viable/strict/1762303207    -> viable/strict/1762303207
2025-12-04T09:33:42.1688445Z  * [new tag]                 viable/strict/1762386584    -> viable/strict/1762386584
2025-12-04T09:33:42.1689626Z  * [new tag]                 viable/strict/1762391537    -> viable/strict/1762391537
2025-12-04T09:33:42.1690528Z  * [new tag]                 viable/strict/1762394119    -> viable/strict/1762394119
2025-12-04T09:33:42.1692063Z  * [new tag]                 viable/strict/1762397437    -> viable/strict/1762397437
2025-12-04T09:33:42.1693255Z  * [new tag]                 viable/strict/1762400256    -> viable/strict/1762400256
2025-12-04T09:33:42.1694417Z  * [new tag]                 viable/strict/1762401469    -> viable/strict/1762401469
2025-12-04T09:33:42.1695610Z  * [new tag]                 viable/strict/1762408195    -> viable/strict/1762408195
2025-12-04T09:33:42.1696775Z  * [new tag]                 viable/strict/1762410411    -> viable/strict/1762410411
2025-12-04T09:33:42.1697965Z  * [new tag]                 viable/strict/1762417613    -> viable/strict/1762417613
2025-12-04T09:33:42.1699154Z  * [new tag]                 viable/strict/1762419198    -> viable/strict/1762419198
2025-12-04T09:33:42.1700264Z  * [new tag]                 viable/strict/1762422656    -> viable/strict/1762422656
2025-12-04T09:33:42.1702050Z  * [new tag]                 viable/strict/1762424746    -> viable/strict/1762424746
2025-12-04T09:33:42.1703285Z  * [new tag]                 viable/strict/1762446386    -> viable/strict/1762446386
2025-12-04T09:33:42.1704528Z  * [new tag]                 viable/strict/1762449912    -> viable/strict/1762449912
2025-12-04T09:33:42.1705639Z  * [new tag]                 viable/strict/1762457031    -> viable/strict/1762457031
2025-12-04T09:33:42.1706771Z  * [new tag]                 viable/strict/1762462441    -> viable/strict/1762462441
2025-12-04T09:33:42.1707924Z  * [new tag]                 viable/strict/1762467909    -> viable/strict/1762467909
2025-12-04T09:33:42.1709190Z  * [new tag]                 viable/strict/1762471493    -> viable/strict/1762471493
2025-12-04T09:33:42.1710321Z  * [new tag]                 viable/strict/1762475990    -> viable/strict/1762475990
2025-12-04T09:33:42.1711515Z  * [new tag]                 viable/strict/1762477933    -> viable/strict/1762477933
2025-12-04T09:33:42.1712763Z  * [new tag]                 viable/strict/1762491053    -> viable/strict/1762491053
2025-12-04T09:33:42.1714109Z  * [new tag]                 viable/strict/1762493118    -> viable/strict/1762493118
2025-12-04T09:33:42.1715050Z  * [new tag]                 viable/strict/1762498442    -> viable/strict/1762498442
2025-12-04T09:33:42.1716204Z  * [new tag]                 viable/strict/1762501778    -> viable/strict/1762501778
2025-12-04T09:33:42.1717358Z  * [new tag]                 viable/strict/1762504001    -> viable/strict/1762504001
2025-12-04T09:33:42.1718644Z  * [new tag]                 viable/strict/1762505583    -> viable/strict/1762505583
2025-12-04T09:33:42.1719840Z  * [new tag]                 viable/strict/1762507523    -> viable/strict/1762507523
2025-12-04T09:33:42.1721046Z  * [new tag]                 viable/strict/1762511140    -> viable/strict/1762511140
2025-12-04T09:33:42.1722677Z  * [new tag]                 viable/strict/1762512632    -> viable/strict/1762512632
2025-12-04T09:33:42.1723938Z  * [new tag]                 viable/strict/1762520467    -> viable/strict/1762520467
2025-12-04T09:33:42.1725136Z  * [new tag]                 viable/strict/1762522016    -> viable/strict/1762522016
2025-12-04T09:33:42.1726239Z  * [new tag]                 viable/strict/1762530591    -> viable/strict/1762530591
2025-12-04T09:33:42.1727420Z  * [new tag]                 viable/strict/1762543405    -> viable/strict/1762543405
2025-12-04T09:33:42.1728313Z  * [new tag]                 viable/strict/1762544998    -> viable/strict/1762544998
2025-12-04T09:33:42.1729477Z  * [new tag]                 viable/strict/1762552182    -> viable/strict/1762552182
2025-12-04T09:33:42.1730603Z  * [new tag]                 viable/strict/1762554297    -> viable/strict/1762554297
2025-12-04T09:33:42.1731529Z  * [new tag]                 viable/strict/1762559381    -> viable/strict/1762559381
2025-12-04T09:33:42.1732760Z  * [new tag]                 viable/strict/1762562222    -> viable/strict/1762562222
2025-12-04T09:33:42.1733908Z  * [new tag]                 viable/strict/1762564319    -> viable/strict/1762564319
2025-12-04T09:33:42.1734829Z  * [new tag]                 viable/strict/1762566904    -> viable/strict/1762566904
2025-12-04T09:33:42.1736009Z  * [new tag]                 viable/strict/1762569781    -> viable/strict/1762569781
2025-12-04T09:33:42.1737098Z  * [new tag]                 viable/strict/1762575940    -> viable/strict/1762575940
2025-12-04T09:33:42.1738257Z  * [new tag]                 viable/strict/1762580974    -> viable/strict/1762580974
2025-12-04T09:33:42.1739436Z  * [new tag]                 viable/strict/1762583185    -> viable/strict/1762583185
2025-12-04T09:33:42.1740596Z  * [new tag]                 viable/strict/1762586647    -> viable/strict/1762586647
2025-12-04T09:33:42.1741708Z  * [new tag]                 viable/strict/1762588183    -> viable/strict/1762588183
2025-12-04T09:33:42.1742860Z  * [new tag]                 viable/strict/1762593886    -> viable/strict/1762593886
2025-12-04T09:33:42.1744008Z  * [new tag]                 viable/strict/1762650743    -> viable/strict/1762650743
2025-12-04T09:33:42.1745258Z  * [new tag]                 viable/strict/1762653328    -> viable/strict/1762653328
2025-12-04T09:33:42.1746395Z  * [new tag]                 viable/strict/1762659342    -> viable/strict/1762659342
2025-12-04T09:33:42.1747527Z  * [new tag]                 viable/strict/1762662360    -> viable/strict/1762662360
2025-12-04T09:33:42.1748686Z  * [new tag]                 viable/strict/1762667377    -> viable/strict/1762667377
2025-12-04T09:33:42.1749928Z  * [new tag]                 viable/strict/1762671090    -> viable/strict/1762671090
2025-12-04T09:33:42.1751090Z  * [new tag]                 viable/strict/1762680284    -> viable/strict/1762680284
2025-12-04T09:33:42.1752223Z  * [new tag]                 viable/strict/1762683900    -> viable/strict/1762683900
2025-12-04T09:33:42.1753344Z  * [new tag]                 viable/strict/1762705541    -> viable/strict/1762705541
2025-12-04T09:33:42.1754477Z  * [new tag]                 viable/strict/1762709004    -> viable/strict/1762709004
2025-12-04T09:33:42.1755713Z  * [new tag]                 viable/strict/1762746004    -> viable/strict/1762746004
2025-12-04T09:33:42.1756935Z  * [new tag]                 viable/strict/1762748799    -> viable/strict/1762748799
2025-12-04T09:33:42.1758052Z  * [new tag]                 viable/strict/1762759504    -> viable/strict/1762759504
2025-12-04T09:33:42.1759282Z  * [new tag]                 viable/strict/1762760973    -> viable/strict/1762760973
2025-12-04T09:33:42.1760809Z  * [new tag]                 viable/strict/1762775374    -> viable/strict/1762775374
2025-12-04T09:33:42.1762036Z  * [new tag]                 viable/strict/1762777661    -> viable/strict/1762777661
2025-12-04T09:33:42.1763325Z  * [new tag]                 viable/strict/1762779774    -> viable/strict/1762779774
2025-12-04T09:33:42.1764644Z  * [new tag]                 viable/strict/1762781259    -> viable/strict/1762781259
2025-12-04T09:33:42.1765932Z  * [new tag]                 viable/strict/1762793628    -> viable/strict/1762793628
2025-12-04T09:33:42.1767119Z  * [new tag]                 viable/strict/1762800711    -> viable/strict/1762800711
2025-12-04T09:33:42.1768272Z  * [new tag]                 viable/strict/1762809894    -> viable/strict/1762809894
2025-12-04T09:33:42.1769424Z  * [new tag]                 viable/strict/1762811384    -> viable/strict/1762811384
2025-12-04T09:33:42.1770665Z  * [new tag]                 viable/strict/1762813841    -> viable/strict/1762813841
2025-12-04T09:33:42.1771832Z  * [new tag]                 viable/strict/1762815047    -> viable/strict/1762815047
2025-12-04T09:33:42.1773086Z  * [new tag]                 viable/strict/1762817094    -> viable/strict/1762817094
2025-12-04T09:33:42.1774301Z  * [new tag]                 viable/strict/1762818582    -> viable/strict/1762818582
2025-12-04T09:33:42.1775462Z  * [new tag]                 viable/strict/1762821623    -> viable/strict/1762821623
2025-12-04T09:33:42.1776374Z  * [new tag]                 viable/strict/1762823531    -> viable/strict/1762823531
2025-12-04T09:33:42.1777614Z  * [new tag]                 viable/strict/1762849583    -> viable/strict/1762849583
2025-12-04T09:33:42.1778801Z  * [new tag]                 viable/strict/1762851200    -> viable/strict/1762851200
2025-12-04T09:33:42.1779922Z  * [new tag]                 viable/strict/1762854603    -> viable/strict/1762854603
2025-12-04T09:33:42.1781073Z  * [new tag]                 viable/strict/1762858276    -> viable/strict/1762858276
2025-12-04T09:33:42.1782338Z  * [new tag]                 viable/strict/1762860891    -> viable/strict/1762860891
2025-12-04T09:33:42.1784112Z  * [new tag]                 viable/strict/1762866174    -> viable/strict/1762866174
2025-12-04T09:33:42.1785253Z  * [new tag]                 viable/strict/1762867653    -> viable/strict/1762867653
2025-12-04T09:33:42.1786412Z  * [new tag]                 viable/strict/1762872669    -> viable/strict/1762872669
2025-12-04T09:33:42.1787309Z  * [new tag]                 viable/strict/1762878380    -> viable/strict/1762878380
2025-12-04T09:33:42.1788659Z  * [new tag]                 viable/strict/1762889003    -> viable/strict/1762889003
2025-12-04T09:33:42.1789813Z  * [new tag]                 viable/strict/1762890589    -> viable/strict/1762890589
2025-12-04T09:33:42.1791008Z  * [new tag]                 viable/strict/1762892743    -> viable/strict/1762892743
2025-12-04T09:33:42.1792174Z  * [new tag]                 viable/strict/1762894271    -> viable/strict/1762894271
2025-12-04T09:33:42.1793142Z  * [new tag]                 viable/strict/1762896287    -> viable/strict/1762896287
2025-12-04T09:33:42.1794301Z  * [new tag]                 viable/strict/1762915871    -> viable/strict/1762915871
2025-12-04T09:33:42.1795479Z  * [new tag]                 viable/strict/1762918569    -> viable/strict/1762918569
2025-12-04T09:33:42.1796441Z  * [new tag]                 viable/strict/1762919776    -> viable/strict/1762919776
2025-12-04T09:33:42.1797644Z  * [new tag]                 viable/strict/1762923072    -> viable/strict/1762923072
2025-12-04T09:33:42.1798918Z  * [new tag]                 viable/strict/1762928826    -> viable/strict/1762928826
2025-12-04T09:33:42.1800109Z  * [new tag]                 viable/strict/1762930451    -> viable/strict/1762930451
2025-12-04T09:33:42.1801373Z  * [new tag]                 viable/strict/1762933780    -> viable/strict/1762933780
2025-12-04T09:33:42.1802695Z  * [new tag]                 viable/strict/1762937638    -> viable/strict/1762937638
2025-12-04T09:33:42.1804010Z  * [new tag]                 viable/strict/1762939545    -> viable/strict/1762939545
2025-12-04T09:33:42.1805225Z  * [new tag]                 viable/strict/1762962692    -> viable/strict/1762962692
2025-12-04T09:33:42.1806433Z  * [new tag]                 viable/strict/1762979143    -> viable/strict/1762979143
2025-12-04T09:33:42.1807577Z  * [new tag]                 viable/strict/1762984188    -> viable/strict/1762984188
2025-12-04T09:33:42.1808503Z  * [new tag]                 viable/strict/1762986306    -> viable/strict/1762986306
2025-12-04T09:33:42.1809719Z  * [new tag]                 viable/strict/1762989903    -> viable/strict/1762989903
2025-12-04T09:33:42.1810891Z  * [new tag]                 viable/strict/1762991377    -> viable/strict/1762991377
2025-12-04T09:33:42.1812021Z  * [new tag]                 viable/strict/1762998921    -> viable/strict/1762998921
2025-12-04T09:33:42.1813354Z  * [new tag]                 viable/strict/1763002287    -> viable/strict/1763002287
2025-12-04T09:33:42.1814498Z  * [new tag]                 viable/strict/1763016840    -> viable/strict/1763016840
2025-12-04T09:33:42.1815654Z  * [new tag]                 viable/strict/1763020180    -> viable/strict/1763020180
2025-12-04T09:33:42.1816838Z  * [new tag]                 viable/strict/1763027421    -> viable/strict/1763027421
2025-12-04T09:33:42.1818036Z  * [new tag]                 viable/strict/1763031120    -> viable/strict/1763031120
2025-12-04T09:33:42.1819245Z  * [new tag]                 viable/strict/1763036861    -> viable/strict/1763036861
2025-12-04T09:33:42.1820408Z  * [new tag]                 viable/strict/1763038993    -> viable/strict/1763038993
2025-12-04T09:33:42.1821653Z  * [new tag]                 viable/strict/1763054703    -> viable/strict/1763054703
2025-12-04T09:33:42.1822896Z  * [new tag]                 viable/strict/1763067061    -> viable/strict/1763067061
2025-12-04T09:33:42.1823708Z  * [new tag]                 viable/strict/1763070847    -> viable/strict/1763070847
2025-12-04T09:33:42.1824986Z  * [new tag]                 viable/strict/1763072706    -> viable/strict/1763072706
2025-12-04T09:33:42.1826308Z  * [new tag]                 viable/strict/1763076302    -> viable/strict/1763076302
2025-12-04T09:33:42.1827499Z  * [new tag]                 viable/strict/1763080816    -> viable/strict/1763080816
2025-12-04T09:33:42.1828622Z  * [new tag]                 viable/strict/1763082732    -> viable/strict/1763082732
2025-12-04T09:33:42.1829732Z  * [new tag]                 viable/strict/1763085329    -> viable/strict/1763085329
2025-12-04T09:33:42.1830883Z  * [new tag]                 viable/strict/1763088623    -> viable/strict/1763088623
2025-12-04T09:33:42.1832247Z  * [new tag]                 viable/strict/1763091402    -> viable/strict/1763091402
2025-12-04T09:33:42.1833332Z  * [new tag]                 viable/strict/1763092602    -> viable/strict/1763092602
2025-12-04T09:33:42.1834473Z  * [new tag]                 viable/strict/1763094355    -> viable/strict/1763094355
2025-12-04T09:33:42.1835624Z  * [new tag]                 viable/strict/1763099390    -> viable/strict/1763099390
2025-12-04T09:33:42.1837229Z  * [new tag]                 viable/strict/1763101608    -> viable/strict/1763101608
2025-12-04T09:33:42.1838437Z  * [new tag]                 viable/strict/1763105102    -> viable/strict/1763105102
2025-12-04T09:33:42.1839655Z  * [new tag]                 viable/strict/1763112347    -> viable/strict/1763112347
2025-12-04T09:33:42.1840769Z  * [new tag]                 viable/strict/1763119471    -> viable/strict/1763119471
2025-12-04T09:33:42.1841903Z  * [new tag]                 viable/strict/1763126835    -> viable/strict/1763126835
2025-12-04T09:33:42.1842758Z  * [new tag]                 viable/strict/1763149779    -> viable/strict/1763149779
2025-12-04T09:33:42.1844084Z  * [new tag]                 viable/strict/1763164178    -> viable/strict/1763164178
2025-12-04T09:33:42.1845259Z  * [new tag]                 viable/strict/1763167104    -> viable/strict/1763167104
2025-12-04T09:33:42.1846358Z  * [new tag]                 viable/strict/1763169132    -> viable/strict/1763169132
2025-12-04T09:33:42.1847518Z  * [new tag]                 viable/strict/1763171708    -> viable/strict/1763171708
2025-12-04T09:33:42.1848638Z  * [new tag]                 viable/strict/1763174759    -> viable/strict/1763174759
2025-12-04T09:33:42.1849831Z  * [new tag]                 viable/strict/1763180744    -> viable/strict/1763180744
2025-12-04T09:33:42.1850968Z  * [new tag]                 viable/strict/1763182227    -> viable/strict/1763182227
2025-12-04T09:33:42.1852073Z  * [new tag]                 viable/strict/1763184309    -> viable/strict/1763184309
2025-12-04T09:33:42.1853940Z  * [new tag]                 viable/strict/1763187991    -> viable/strict/1763187991
2025-12-04T09:33:42.1855036Z  * [new tag]                 viable/strict/1763191445    -> viable/strict/1763191445
2025-12-04T09:33:42.1856409Z  * [new tag]                 viable/strict/1763195152    -> viable/strict/1763195152
2025-12-04T09:33:42.1857276Z  * [new tag]                 viable/strict/1763205769    -> viable/strict/1763205769
2025-12-04T09:33:42.1858588Z  * [new tag]                 viable/strict/1763246990    -> viable/strict/1763246990
2025-12-04T09:33:42.1859790Z  * [new tag]                 viable/strict/1763261578    -> viable/strict/1763261578
2025-12-04T09:33:42.1860807Z  * [new tag]                 viable/strict/1763286573    -> viable/strict/1763286573
2025-12-04T09:33:42.1861825Z  * [new tag]                 viable/strict/1763292167    -> viable/strict/1763292167
2025-12-04T09:33:42.1863037Z  * [new tag]                 viable/strict/1763333386    -> viable/strict/1763333386
2025-12-04T09:33:42.1864272Z  * [new tag]                 viable/strict/1763340082    -> viable/strict/1763340082
2025-12-04T09:33:42.1866237Z  * [new tag]                 viable/strict/1763364324    -> viable/strict/1763364324
2025-12-04T09:33:42.1867380Z  * [new tag]                 viable/strict/1763371569    -> viable/strict/1763371569
2025-12-04T09:33:42.1868563Z  * [new tag]                 viable/strict/1763373067    -> viable/strict/1763373067
2025-12-04T09:33:42.1869650Z  * [new tag]                 viable/strict/1763375157    -> viable/strict/1763375157
2025-12-04T09:33:42.1870819Z  * [new tag]                 viable/strict/1763382462    -> viable/strict/1763382462
2025-12-04T09:33:42.1872005Z  * [new tag]                 viable/strict/1763394661    -> viable/strict/1763394661
2025-12-04T09:33:42.1873367Z  * [new tag]                 viable/strict/1763396797    -> viable/strict/1763396797
2025-12-04T09:33:42.1874569Z  * [new tag]                 viable/strict/1763398542    -> viable/strict/1763398542
2025-12-04T09:33:42.1875763Z  * [new tag]                 viable/strict/1763401807    -> viable/strict/1763401807
2025-12-04T09:33:42.1876790Z  * [new tag]                 viable/strict/1763414698    -> viable/strict/1763414698
2025-12-04T09:33:42.1877925Z  * [new tag]                 viable/strict/1763419807    -> viable/strict/1763419807
2025-12-04T09:33:42.1879076Z  * [new tag]                 viable/strict/1763426369    -> viable/strict/1763426369
2025-12-04T09:33:42.1880314Z  * [new tag]                 viable/strict/1763428331    -> viable/strict/1763428331
2025-12-04T09:33:42.1881509Z  * [new tag]                 viable/strict/1763430922    -> viable/strict/1763430922
2025-12-04T09:33:42.1882520Z  * [new tag]                 viable/strict/1763434184    -> viable/strict/1763434184
2025-12-04T09:33:42.1883865Z  * [new tag]                 viable/strict/1763439973    -> viable/strict/1763439973
2025-12-04T09:33:42.1885113Z  * [new tag]                 viable/strict/1763444995    -> viable/strict/1763444995
2025-12-04T09:33:42.1886171Z  * [new tag]                 viable/strict/1763447206    -> viable/strict/1763447206
2025-12-04T09:33:42.1887372Z  * [new tag]                 viable/strict/1763448826    -> viable/strict/1763448826
2025-12-04T09:33:42.1888584Z  * [new tag]                 viable/strict/1763450717    -> viable/strict/1763450717
2025-12-04T09:33:42.1889736Z  * [new tag]                 viable/strict/1763452183    -> viable/strict/1763452183
2025-12-04T09:33:42.1890983Z  * [new tag]                 viable/strict/1763457945    -> viable/strict/1763457945
2025-12-04T09:33:42.1892106Z  * [new tag]                 viable/strict/1763459439    -> viable/strict/1763459439
2025-12-04T09:33:42.1893156Z  * [new tag]                 viable/strict/1763461556    -> viable/strict/1763461556
2025-12-04T09:33:42.1894234Z  * [new tag]                 viable/strict/1763463103    -> viable/strict/1763463103
2025-12-04T09:33:42.1895495Z  * [new tag]                 viable/strict/1763465100    -> viable/strict/1763465100
2025-12-04T09:33:42.1896375Z  * [new tag]                 viable/strict/1763468866    -> viable/strict/1763468866
2025-12-04T09:33:42.1897437Z  * [new tag]                 viable/strict/1763493823    -> viable/strict/1763493823
2025-12-04T09:33:42.1898362Z  * [new tag]                 viable/strict/1763496249    -> viable/strict/1763496249
2025-12-04T09:33:42.1899582Z  * [new tag]                 viable/strict/1763502620    -> viable/strict/1763502620
2025-12-04T09:33:42.1900800Z  * [new tag]                 viable/strict/1763504715    -> viable/strict/1763504715
2025-12-04T09:33:42.1902233Z  * [new tag]                 viable/strict/1763506208    -> viable/strict/1763506208
2025-12-04T09:33:42.1903403Z  * [new tag]                 viable/strict/1763520590    -> viable/strict/1763520590
2025-12-04T09:33:42.1904578Z  * [new tag]                 viable/strict/1763523357    -> viable/strict/1763523357
2025-12-04T09:33:42.1905822Z  * [new tag]                 viable/strict/1763529922    -> viable/strict/1763529922
2025-12-04T09:33:42.1907070Z  * [new tag]                 viable/strict/1763531408    -> viable/strict/1763531408
2025-12-04T09:33:42.1908248Z  * [new tag]                 viable/strict/1763533622    -> viable/strict/1763533622
2025-12-04T09:33:42.1909408Z  * [new tag]                 viable/strict/1763538576    -> viable/strict/1763538576
2025-12-04T09:33:42.1910615Z  * [new tag]                 viable/strict/1763545823    -> viable/strict/1763545823
2025-12-04T09:33:42.1911630Z  * [new tag]                 viable/strict/1763547951    -> viable/strict/1763547951
2025-12-04T09:33:42.1913286Z  * [new tag]                 viable/strict/1763551477    -> viable/strict/1763551477
2025-12-04T09:33:42.1914486Z  * [new tag]                 viable/strict/1763552982    -> viable/strict/1763552982
2025-12-04T09:33:42.1915668Z  * [new tag]                 viable/strict/1763594698    -> viable/strict/1763594698
2025-12-04T09:33:42.1916823Z  * [new tag]                 viable/strict/1763596178    -> viable/strict/1763596178
2025-12-04T09:33:42.1918032Z  * [new tag]                 viable/strict/1763599155    -> viable/strict/1763599155
2025-12-04T09:33:42.1919171Z  * [new tag]                 viable/strict/1763603717    -> viable/strict/1763603717
2025-12-04T09:33:42.1920314Z  * [new tag]                 viable/strict/1763606923    -> viable/strict/1763606923
2025-12-04T09:33:42.1921517Z  * [new tag]                 viable/strict/1763609715    -> viable/strict/1763609715
2025-12-04T09:33:42.1922719Z  * [new tag]                 viable/strict/1763612757    -> viable/strict/1763612757
2025-12-04T09:33:42.1923923Z  * [new tag]                 viable/strict/1763616325    -> viable/strict/1763616325
2025-12-04T09:33:42.1925058Z  * [new tag]                 viable/strict/1763623509    -> viable/strict/1763623509
2025-12-04T09:33:42.1926404Z  * [new tag]                 viable/strict/1763624984    -> viable/strict/1763624984
2025-12-04T09:33:42.1927643Z  * [new tag]                 viable/strict/1763628796    -> viable/strict/1763628796
2025-12-04T09:33:42.1928678Z  * [new tag]                 viable/strict/1763634343    -> viable/strict/1763634343
2025-12-04T09:33:42.1929805Z  * [new tag]                 viable/strict/1763635867    -> viable/strict/1763635867
2025-12-04T09:33:42.1931133Z  * [new tag]                 viable/strict/1763639382    -> viable/strict/1763639382
2025-12-04T09:33:42.1932281Z  * [new tag]                 viable/strict/1763646626    -> viable/strict/1763646626
2025-12-04T09:33:42.1933647Z  * [new tag]                 viable/strict/1763655997    -> viable/strict/1763655997
2025-12-04T09:33:42.1934764Z  * [new tag]                 viable/strict/1763659444    -> viable/strict/1763659444
2025-12-04T09:33:42.1935868Z  * [new tag]                 viable/strict/1763660992    -> viable/strict/1763660992
2025-12-04T09:33:42.1936956Z  * [new tag]                 viable/strict/1763663201    -> viable/strict/1763663201
2025-12-04T09:33:42.1938164Z  * [new tag]                 viable/strict/1763670362    -> viable/strict/1763670362
2025-12-04T09:33:42.1939196Z  * [new tag]                 viable/strict/1763675378    -> viable/strict/1763675378
2025-12-04T09:33:42.1940478Z  * [new tag]                 viable/strict/1763693343    -> viable/strict/1763693343
2025-12-04T09:33:42.1941597Z  * [new tag]                 viable/strict/1763696088    -> viable/strict/1763696088
2025-12-04T09:33:42.1942882Z  * [new tag]                 viable/strict/1763697343    -> viable/strict/1763697343
2025-12-04T09:33:42.1944110Z  * [new tag]                 viable/strict/1763699165    -> viable/strict/1763699165
2025-12-04T09:33:42.1945215Z  * [new tag]                 viable/strict/1763700660    -> viable/strict/1763700660
2025-12-04T09:33:42.1946341Z  * [new tag]                 viable/strict/1763704209    -> viable/strict/1763704209
2025-12-04T09:33:42.1947477Z  * [new tag]                 viable/strict/1763706411    -> viable/strict/1763706411
2025-12-04T09:33:42.1948610Z  * [new tag]                 viable/strict/1763708082    -> viable/strict/1763708082
2025-12-04T09:33:42.1949605Z  * [new tag]                 viable/strict/1763711381    -> viable/strict/1763711381
2025-12-04T09:33:42.1950669Z  * [new tag]                 viable/strict/1763713593    -> viable/strict/1763713593
2025-12-04T09:33:42.1951857Z  * [new tag]                 viable/strict/1763715201    -> viable/strict/1763715201
2025-12-04T09:33:42.1953003Z  * [new tag]                 viable/strict/1763733017    -> viable/strict/1763733017
2025-12-04T09:33:42.1954169Z  * [new tag]                 viable/strict/1763735108    -> viable/strict/1763735108
2025-12-04T09:33:42.1955280Z  * [new tag]                 viable/strict/1763749579    -> viable/strict/1763749579
2025-12-04T09:33:42.1956562Z  * [new tag]                 viable/strict/1763751113    -> viable/strict/1763751113
2025-12-04T09:33:42.1957632Z  * [new tag]                 viable/strict/1763753035    -> viable/strict/1763753035
2025-12-04T09:33:42.1958834Z  * [new tag]                 viable/strict/1763754578    -> viable/strict/1763754578
2025-12-04T09:33:42.1959954Z  * [new tag]                 viable/strict/1763756748    -> viable/strict/1763756748
2025-12-04T09:33:42.1961034Z  * [new tag]                 viable/strict/1763758205    -> viable/strict/1763758205
2025-12-04T09:33:42.1962018Z  * [new tag]                 viable/strict/1763764050    -> viable/strict/1763764050
2025-12-04T09:33:42.1963249Z  * [new tag]                 viable/strict/1763771887    -> viable/strict/1763771887
2025-12-04T09:33:42.1964625Z  * [new tag]                 viable/strict/1763773920    -> viable/strict/1763773920
2025-12-04T09:33:42.1965739Z  * [new tag]                 viable/strict/1763776501    -> viable/strict/1763776501
2025-12-04T09:33:42.1966803Z  * [new tag]                 viable/strict/1763779437    -> viable/strict/1763779437
2025-12-04T09:33:42.1968228Z  * [new tag]                 viable/strict/1763781038    -> viable/strict/1763781038
2025-12-04T09:33:42.1969334Z  * [new tag]                 viable/strict/1763782245    -> viable/strict/1763782245
2025-12-04T09:33:42.1970336Z  * [new tag]                 viable/strict/1763785568    -> viable/strict/1763785568
2025-12-04T09:33:42.1971512Z  * [new tag]                 viable/strict/1763787006    -> viable/strict/1763787006
2025-12-04T09:33:42.1972763Z  * [new tag]                 viable/strict/1763789103    -> viable/strict/1763789103
2025-12-04T09:33:42.1973904Z  * [new tag]                 viable/strict/1763790578    -> viable/strict/1763790578
2025-12-04T09:33:42.1975005Z  * [new tag]                 viable/strict/1763796275    -> viable/strict/1763796275
2025-12-04T09:33:42.1976432Z  * [new tag]                 viable/strict/1763801465    -> viable/strict/1763801465
2025-12-04T09:33:42.1977624Z  * [new tag]                 viable/strict/1763803522    -> viable/strict/1763803522
2025-12-04T09:33:42.1978748Z  * [new tag]                 viable/strict/1763808581    -> viable/strict/1763808581
2025-12-04T09:33:42.1979847Z  * [new tag]                 viable/strict/1763840977    -> viable/strict/1763840977
2025-12-04T09:33:42.1980931Z  * [new tag]                 viable/strict/1763846659    -> viable/strict/1763846659
2025-12-04T09:33:42.1982076Z  * [new tag]                 viable/strict/1763872065    -> viable/strict/1763872065
2025-12-04T09:33:42.1983353Z  * [new tag]                 viable/strict/1763873648    -> viable/strict/1763873648
2025-12-04T09:33:42.1984484Z  * [new tag]                 viable/strict/1763875506    -> viable/strict/1763875506
2025-12-04T09:33:42.1985510Z  * [new tag]                 viable/strict/1763889904    -> viable/strict/1763889904
2025-12-04T09:33:42.1986652Z  * [new tag]                 viable/strict/1763930999    -> viable/strict/1763930999
2025-12-04T09:33:42.1988271Z  * [new tag]                 viable/strict/1763944964    -> viable/strict/1763944964
2025-12-04T09:33:42.1989145Z  * [new tag]                 viable/strict/1763958474    -> viable/strict/1763958474
2025-12-04T09:33:42.1990423Z  * [new tag]                 viable/strict/1763967263    -> viable/strict/1763967263
2025-12-04T09:33:42.1991565Z  * [new tag]                 viable/strict/1763972803    -> viable/strict/1763972803
2025-12-04T09:33:42.1992690Z  * [new tag]                 viable/strict/1763976376    -> viable/strict/1763976376
2025-12-04T09:33:42.1993795Z  * [new tag]                 viable/strict/1763989404    -> viable/strict/1763989404
2025-12-04T09:33:42.1994910Z  * [new tag]                 viable/strict/1763990887    -> viable/strict/1763990887
2025-12-04T09:33:42.1996123Z  * [new tag]                 viable/strict/1764019919    -> viable/strict/1764019919
2025-12-04T09:33:42.1997317Z  * [new tag]                 viable/strict/1764023134    -> viable/strict/1764023134
2025-12-04T09:33:42.1998338Z  * [new tag]                 viable/strict/1764024593    -> viable/strict/1764024593
2025-12-04T09:33:42.1999422Z  * [new tag]                 viable/strict/1764026706    -> viable/strict/1764026706
2025-12-04T09:33:42.2000989Z  * [new tag]                 viable/strict/1764031139    -> viable/strict/1764031139
2025-12-04T09:33:42.2002304Z  * [new tag]                 viable/strict/1764033131    -> viable/strict/1764033131
2025-12-04T09:33:42.2003308Z  * [new tag]                 viable/strict/1764035725    -> viable/strict/1764035725
2025-12-04T09:33:42.2004150Z  * [new tag]                 viable/strict/1764624265    -> viable/strict/1764624265
2025-12-04T09:33:42.2005170Z  * [new tag]                 viable/strict/1764631514    -> viable/strict/1764631514
2025-12-04T09:33:42.2006179Z  * [new tag]                 viable/strict/1764632987    -> viable/strict/1764632987
2025-12-04T09:33:42.2007023Z  * [new tag]                 viable/strict/1764636063    -> viable/strict/1764636063
2025-12-04T09:33:42.2008016Z  * [new tag]                 viable/strict/1764643975    -> viable/strict/1764643975
2025-12-04T09:33:42.2008872Z  * [new tag]                 viable/strict/1764646859    -> viable/strict/1764646859
2025-12-04T09:33:42.2010033Z  * [new tag]                 viable/strict/1764653120    -> viable/strict/1764653120
2025-12-04T09:33:42.2010773Z  * [new tag]                 viable/strict/1764654632    -> viable/strict/1764654632
2025-12-04T09:33:42.2011767Z  * [new tag]                 viable/strict/1764656821    -> viable/strict/1764656821
2025-12-04T09:33:42.2012846Z  * [new tag]                 viable/strict/1764658557    -> viable/strict/1764658557
2025-12-04T09:33:42.2013710Z  * [new tag]                 viable/strict/1764660333    -> viable/strict/1764660333
2025-12-04T09:33:42.2014741Z  * [new tag]                 viable/strict/1764661812    -> viable/strict/1764661812
2025-12-04T09:33:42.2015577Z  * [new tag]                 viable/strict/1764664023    -> viable/strict/1764664023
2025-12-04T09:33:42.2016592Z  * [new tag]                 viable/strict/1764669150    -> viable/strict/1764669150
2025-12-04T09:33:42.2017462Z  * [new tag]                 viable/strict/1764680709    -> viable/strict/1764680709
2025-12-04T09:33:42.2018447Z  * [new tag]                 viable/strict/1764687619    -> viable/strict/1764687619
2025-12-04T09:33:42.2019326Z  * [new tag]                 viable/strict/1764696355    -> viable/strict/1764696355
2025-12-04T09:33:42.2020341Z  * [new tag]                 viable/strict/1764701767    -> viable/strict/1764701767
2025-12-04T09:33:42.2021218Z  * [new tag]                 viable/strict/1764710768    -> viable/strict/1764710768
2025-12-04T09:33:42.2022218Z  * [new tag]                 viable/strict/1764716202    -> viable/strict/1764716202
2025-12-04T09:33:42.2023070Z  * [new tag]                 viable/strict/1764793566    -> viable/strict/1764793566
2025-12-04T09:33:42.2024124Z  * [new tag]                 viable/strict/1764797093    -> viable/strict/1764797093
2025-12-04T09:33:42.2024982Z  * [new tag]                 viable/strict/1764800729    -> viable/strict/1764800729
2025-12-04T09:33:42.2026257Z  * [new tag]                 whc_flight_1                -> whc_flight_1
2025-12-04T09:33:42.2027432Z  * [new tag]                 whc_flight_2                -> whc_flight_2
2025-12-04T09:33:42.2028840Z  * [new tag]                 whc_flight_4                -> whc_flight_4
2025-12-04T09:33:42.2887158Z [command]/usr/bin/git rev-parse --verify --quiet ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32^{object}
2025-12-04T09:33:42.2917990Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:42.2923785Z ##[endgroup]
2025-12-04T09:33:42.2924561Z ##[group]Determining the checkout info
2025-12-04T09:33:42.2925482Z ##[endgroup]
2025-12-04T09:33:42.2930726Z [command]/usr/bin/git sparse-checkout disable
2025-12-04T09:33:42.2966112Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig
2025-12-04T09:33:42.2993569Z ##[group]Checking out the ref
2025-12-04T09:33:42.2998213Z [command]/usr/bin/git checkout --progress --force ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:33:43.3461707Z Updating files:  75% (15216/20121)
2025-12-04T09:33:43.3619747Z Updating files:  76% (15292/20121)
2025-12-04T09:33:43.3763218Z Updating files:  77% (15494/20121)
2025-12-04T09:33:43.3990839Z Updating files:  78% (15695/20121)
2025-12-04T09:33:43.4283926Z Updating files:  79% (15896/20121)
2025-12-04T09:33:43.4639915Z Updating files:  80% (16097/20121)
2025-12-04T09:33:43.4959656Z Updating files:  81% (16299/20121)
2025-12-04T09:33:43.5195473Z Updating files:  82% (16500/20121)
2025-12-04T09:33:43.5362498Z Updating files:  83% (16701/20121)
2025-12-04T09:33:43.5515450Z Updating files:  84% (16902/20121)
2025-12-04T09:33:43.5693288Z Updating files:  85% (17103/20121)
2025-12-04T09:33:43.5862142Z Updating files:  86% (17305/20121)
2025-12-04T09:33:43.6013958Z Updating files:  87% (17506/20121)
2025-12-04T09:33:43.6138214Z Updating files:  88% (17707/20121)
2025-12-04T09:33:43.6289684Z Updating files:  89% (17908/20121)
2025-12-04T09:33:43.6479461Z Updating files:  90% (18109/20121)
2025-12-04T09:33:43.6605615Z Updating files:  91% (18311/20121)
2025-12-04T09:33:43.6777168Z Updating files:  92% (18512/20121)
2025-12-04T09:33:43.6980619Z Updating files:  93% (18713/20121)
2025-12-04T09:33:43.7208256Z Updating files:  94% (18914/20121)
2025-12-04T09:33:43.7402288Z Updating files:  95% (19115/20121)
2025-12-04T09:33:43.7575425Z Updating files:  96% (19317/20121)
2025-12-04T09:33:43.7758337Z Updating files:  97% (19518/20121)
2025-12-04T09:33:43.8073447Z Updating files:  98% (19719/20121)
2025-12-04T09:33:43.8267213Z Updating files:  99% (19920/20121)
2025-12-04T09:33:43.8267600Z Updating files: 100% (20121/20121)
2025-12-04T09:33:43.8267954Z Updating files: 100% (20121/20121), done.
2025-12-04T09:33:43.8578213Z Note: switching to 'ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32'.
2025-12-04T09:33:43.8578610Z 
2025-12-04T09:33:43.8578861Z You are in 'detached HEAD' state. You can look around, make experimental
2025-12-04T09:33:43.8579511Z changes and commit them, and you can discard any commits you make in this
2025-12-04T09:33:43.8580163Z state without impacting any branches by switching back to a branch.
2025-12-04T09:33:43.8580564Z 
2025-12-04T09:33:43.8580820Z If you want to create a new branch to retain commits you create, you may
2025-12-04T09:33:43.8581402Z do so (now or later) by using -c with the switch command. Example:
2025-12-04T09:33:43.8581757Z 
2025-12-04T09:33:43.8581883Z   git switch -c <new-branch-name>
2025-12-04T09:33:43.8582343Z 
2025-12-04T09:33:43.8582477Z Or undo this operation with:
2025-12-04T09:33:43.8582686Z 
2025-12-04T09:33:43.8582786Z   git switch -
2025-12-04T09:33:43.8582950Z 
2025-12-04T09:33:43.8583226Z Turn off this advice by setting config variable advice.detachedHead to false
2025-12-04T09:33:43.8583651Z 
2025-12-04T09:33:43.8583996Z HEAD is now at ffd9b0fb435 Resolve collective autotuning test failure on arm (#168919)
2025-12-04T09:33:43.8671150Z ##[endgroup]
2025-12-04T09:33:43.8671675Z ##[group]Setting up auth for fetching submodules
2025-12-04T09:33:43.8677787Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2025-12-04T09:33:43.8731836Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2025-12-04T09:33:43.8761176Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2025-12-04T09:33:43.8790744Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2025-12-04T09:33:43.8815889Z ##[endgroup]
2025-12-04T09:33:43.8816408Z ##[group]Fetching submodules
2025-12-04T09:33:43.8820464Z [command]/usr/bin/git submodule sync --recursive
2025-12-04T09:33:43.9161241Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2025-12-04T09:33:43.9502009Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2025-12-04T09:33:43.9504508Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2025-12-04T09:33:43.9507880Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2025-12-04T09:33:43.9512387Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2025-12-04T09:33:43.9516721Z Submodule 'third_party/NVTX' (https://github.com/NVIDIA/NVTX.git) registered for path 'third_party/NVTX'
2025-12-04T09:33:43.9522077Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2025-12-04T09:33:43.9526394Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2025-12-04T09:33:43.9531272Z Submodule 'third_party/aiter' (https://github.com/ROCm/aiter.git) registered for path 'third_party/aiter'
2025-12-04T09:33:43.9536301Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2025-12-04T09:33:43.9542191Z Submodule 'third_party/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/composable_kernel'
2025-12-04T09:33:43.9547214Z Submodule 'third_party/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'third_party/cpp-httplib'
2025-12-04T09:33:43.9552575Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2025-12-04T09:33:43.9558228Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2025-12-04T09:33:43.9564125Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2025-12-04T09:33:43.9569691Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2025-12-04T09:33:43.9575745Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'third_party/flash-attention'
2025-12-04T09:33:43.9583369Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2025-12-04T09:33:43.9589298Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2025-12-04T09:33:43.9595723Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:33:43.9601854Z Submodule 'third_party/gloo' (https://github.com/pytorch/gloo) registered for path 'third_party/gloo'
2025-12-04T09:33:43.9608623Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2025-12-04T09:33:43.9614807Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2025-12-04T09:33:43.9621350Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2025-12-04T09:33:43.9627970Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2025-12-04T09:33:43.9634881Z Submodule 'third_party/kleidiai' (https://github.com/ARM-software/kleidiai.git) registered for path 'third_party/kleidiai'
2025-12-04T09:33:43.9641771Z Submodule 'third_party/mimalloc' (https://github.com/microsoft/mimalloc.git) registered for path 'third_party/mimalloc'
2025-12-04T09:33:43.9648842Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2025-12-04T09:33:43.9655818Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2025-12-04T09:33:43.9661738Z Submodule 'third_party/opentelemetry-cpp' (https://github.com/open-telemetry/opentelemetry-cpp.git) registered for path 'third_party/opentelemetry-cpp'
2025-12-04T09:33:43.9667389Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2025-12-04T09:33:43.9673472Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2025-12-04T09:33:43.9679583Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2025-12-04T09:33:43.9686140Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2025-12-04T09:33:43.9694336Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2025-12-04T09:33:43.9700395Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2025-12-04T09:33:43.9707080Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2025-12-04T09:33:43.9714003Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2025-12-04T09:33:43.9748511Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2025-12-04T09:33:44.2010666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2025-12-04T09:33:44.2011667Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2025-12-04T09:33:44.2048310Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2025-12-04T09:33:47.9644149Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2025-12-04T09:33:47.9646410Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2025-12-04T09:33:47.9648376Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NVTX'...
2025-12-04T09:33:47.9650169Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2025-12-04T09:33:47.9652096Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2025-12-04T09:33:47.9654173Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention'...
2025-12-04T09:33:47.9656165Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpp-httplib'...
2025-12-04T09:33:47.9658321Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2025-12-04T09:33:47.9660481Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2025-12-04T09:33:47.9662265Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2025-12-04T09:33:47.9664887Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kleidiai'...
2025-12-04T09:33:47.9666734Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2025-12-04T09:33:47.9750940Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2025-12-04T09:33:47.9752666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2025-12-04T09:33:47.9754317Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/mimalloc'...
2025-12-04T09:33:47.9756023Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2025-12-04T09:33:47.9757736Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2025-12-04T09:33:47.9759441Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2025-12-04T09:33:48.1462410Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2025-12-04T09:33:48.1567564Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
2025-12-04T09:34:09.3567168Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2025-12-04T09:34:09.3574991Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2025-12-04T09:34:09.3579746Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2025-12-04T09:34:09.3581411Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2025-12-04T09:34:09.3583006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2025-12-04T09:34:09.3584415Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2025-12-04T09:34:09.3585934Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2025-12-04T09:34:09.3587458Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2025-12-04T09:34:09.3589142Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
2025-12-04T09:34:09.3591343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2025-12-04T09:34:09.4568250Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2025-12-04T09:34:13.4608759Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter'...
2025-12-04T09:34:13.4609690Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2025-12-04T09:34:13.4788605Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2025-12-04T09:34:13.4932352Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2025-12-04T09:34:13.5044561Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2025-12-04T09:34:13.5338383Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2025-12-04T09:34:13.6315382Z Submodule path 'third_party/NVTX': checked out '3ebbc93ded7285963bff932c678fa367eb393ba6'
2025-12-04T09:34:13.6964997Z Submodule path 'third_party/VulkanMemoryAllocator': checked out '1d8f600fd424278486eade7ed3e877c99f0846b1'
2025-12-04T09:34:14.5549519Z Submodule path 'third_party/XNNPACK': checked out '51a0103656eff6fc9bfd39a4597923c4b542c883'
2025-12-04T09:34:14.7741573Z Submodule path 'third_party/aiter': checked out '01aae101b9e5e94d6c16a9514c9fb8df99c93150'
2025-12-04T09:34:14.7763993Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:34:14.7793541Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/aiter/3rdparty/composable_kernel'...
2025-12-04T09:34:20.1125669Z Submodule path 'third_party/aiter/3rdparty/composable_kernel': checked out 'cffe8fa2a442ac8e80dd236a1a5d24fe3d7e0cbf'
2025-12-04T09:34:20.1407362Z Submodule path 'third_party/benchmark': checked out '299e5928955cc62af9968370293b916f5130916f'
2025-12-04T09:34:20.5544747Z Submodule path 'third_party/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:34:20.6134925Z Submodule path 'third_party/cpp-httplib': checked out '89c932f313c6437c38f2982869beacc89c2f2246'
2025-12-04T09:34:20.7259755Z Submodule path 'third_party/cpuinfo': checked out 'f858c30bcb16f8effd5ff46996f0514539e17abc'
2025-12-04T09:34:20.7822585Z Submodule path 'third_party/cudnn_frontend': checked out '0b1577c8c83401237d601d0d0db5210506705396'
2025-12-04T09:34:21.5312660Z Submodule path 'third_party/cutlass': checked out 'f88806b1e31dfa579842638740216dd41fc6c588'
2025-12-04T09:34:21.7129658Z Submodule path 'third_party/fbgemm': checked out 'c0b988d39a9e47c794d699f29930ed4d7c7e13a4'
2025-12-04T09:34:21.7154161Z Submodule 'external/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/external/asmjit'
2025-12-04T09:34:21.7157235Z Submodule 'external/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:34:21.7160118Z Submodule 'external/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:34:21.7163353Z Submodule 'external/cutlass' (https://github.com/jwfromm/cutlass) registered for path 'third_party/fbgemm/external/cutlass'
2025-12-04T09:34:21.7166704Z Submodule 'external/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/external/googletest'
2025-12-04T09:34:21.7170164Z Submodule 'external/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:34:21.7173503Z Submodule 'external/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/fbgemm/external/json'
2025-12-04T09:34:21.7206265Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/asmjit'...
2025-12-04T09:34:23.1000084Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/hipify_torch'...
2025-12-04T09:34:23.1001363Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cpuinfo'...
2025-12-04T09:34:23.1002570Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/googletest'...
2025-12-04T09:34:23.2001434Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/composable_kernel'...
2025-12-04T09:34:26.8271496Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/cutlass'...
2025-12-04T09:34:26.9272239Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/external/json'...
2025-12-04T09:34:30.1851965Z Submodule path 'third_party/fbgemm/external/asmjit': checked out 'a3199e8857792cd10b7589ff5d58343d2c9008ea'
2025-12-04T09:34:30.5977215Z Submodule path 'third_party/fbgemm/external/composable_kernel': checked out '7fe50dc3da2069d6645d9deb8c017a876472a977'
2025-12-04T09:34:30.7145983Z Submodule path 'third_party/fbgemm/external/cpuinfo': checked out '6543fec09b2f04ac4a666882998b534afc9c1349'
2025-12-04T09:34:31.4522088Z Submodule path 'third_party/fbgemm/external/cutlass': checked out '98125ce499b0fdf7ffbe0e3052f5b8709f4840f8'
2025-12-04T09:34:31.5061566Z Submodule path 'third_party/fbgemm/external/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:34:31.5201155Z Submodule path 'third_party/fbgemm/external/hipify_torch': checked out '63b6a7b541fa7f08f8475ca7d74054db36ff2691'
2025-12-04T09:34:31.6381088Z Submodule path 'third_party/fbgemm/external/json': checked out '9cca280a4d0ccf0c08f47a99aa71d1b0e52f8d03'
2025-12-04T09:34:31.7203548Z Submodule path 'third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5'
2025-12-04T09:34:31.7225615Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:34:31.7228169Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:34:31.7260021Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/composable_kernel'...
2025-12-04T09:34:36.5073685Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flash-attention/csrc/cutlass'...
2025-12-04T09:34:36.7944589Z Submodule path 'third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33'
2025-12-04T09:34:37.4508170Z Submodule path 'third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420'
2025-12-04T09:34:37.6133104Z Submodule path 'third_party/flatbuffers': checked out 'a2cd1ea3b6d3fee220106b5fed3f7ce8da9eb757'
2025-12-04T09:34:37.6477676Z Submodule path 'third_party/fmt': checked out '407c905e45ad75fc29bf0f9bb7c5c2fd3475976f'
2025-12-04T09:34:37.6943071Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2025-12-04T09:34:37.7244068Z Submodule path 'third_party/gloo': checked out '54cbae0d3a67fa890b4c3d9ee162b7860315e341'
2025-12-04T09:34:37.7773092Z Submodule path 'third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:34:37.7928707Z Submodule path 'third_party/ideep': checked out '719d8e6cd7f7a0e01b155657526d693acf97c2b3'
2025-12-04T09:34:37.7947733Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2025-12-04T09:34:37.7976337Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2025-12-04T09:34:56.1120282Z Submodule path 'third_party/ideep/mkl-dnn': checked out '8d263e693366ef8db40acc569cc7d8edf644556d'
2025-12-04T09:34:56.1361042Z Submodule path 'third_party/ittapi': checked out 'dec1d23ca65ab069d225dfe40dea14f455170959'
2025-12-04T09:34:56.2358721Z Submodule path 'third_party/kineto': checked out '31f85df8fbd89c188f14ef10f1ec65379786b943'
2025-12-04T09:34:56.2380323Z Submodule 'libkineto/third_party/dynolog' (https://github.com/facebookincubator/dynolog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:34:56.2383032Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:34:56.2386197Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:34:56.2417488Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog'...
2025-12-04T09:34:57.0156938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2025-12-04T09:34:57.7030344Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2025-12-04T09:34:57.8088319Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog': checked out 'd2ffe0a4e3acace628db49974246b66fc3e85fb1'
2025-12-04T09:34:57.8106671Z Submodule 'third_party/DCGM' (https://github.com/NVIDIA/DCGM.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:34:57.8109619Z Submodule 'third_party/cpr' (https://github.com/libcpr/cpr.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:34:57.8112802Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:34:57.8116256Z Submodule 'third_party/gflags' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:34:57.8119587Z Submodule 'third_party/glog' (https://github.com/google/glog.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:34:57.8123211Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:34:57.8126893Z Submodule 'third_party/json' (https://github.com/nlohmann/json.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:34:57.8130589Z Submodule 'third_party/pfs' (https://github.com/dtrugman/pfs.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:34:57.8134657Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:34:57.8167436Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'...
2025-12-04T09:34:59.8366822Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'...
2025-12-04T09:34:59.8368302Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'...
2025-12-04T09:34:59.8369897Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'...
2025-12-04T09:34:59.8371264Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'...
2025-12-04T09:34:59.8372595Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/glog'...
2025-12-04T09:34:59.8373983Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'...
2025-12-04T09:34:59.8375569Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'...
2025-12-04T09:34:59.9367380Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/json'...
2025-12-04T09:35:06.3839549Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM': checked out 'ffde4e54bc7249a6039a5e6b45b395141e1217f9'
2025-12-04T09:35:06.4050853Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr': checked out '871ed52d350214a034f6ef8a3b8f51c5ce1bd400'
2025-12-04T09:35:06.4477534Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05'
2025-12-04T09:35:06.4639349Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags': checked out 'e171aa2d15ed9eb17054558e0b3a6a413bb01067'
2025-12-04T09:35:06.4657720Z Submodule 'doc' (https://github.com/gflags/gflags.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:06.4687263Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'...
2025-12-04T09:35:06.7619235Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc': checked out '8411df715cf522606e3b1aca386ddfc0b63d34b4'
2025-12-04T09:35:06.7841327Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog': checked out 'b33e3bad4c46c8a6345525fd822af355e5ef9446'
2025-12-04T09:35:06.8377004Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:35:06.9518417Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/json': checked out '4f8fba14066156b73f1189a2b8bd568bde5284c5'
2025-12-04T09:35:06.9716110Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs': checked out 'f68a2fa8ea36c783bdd760371411fcb495aa3150'
2025-12-04T09:35:06.9919236Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp': checked out 'b1234816facfdda29845c46696a02998a4af115a'
2025-12-04T09:35:06.9938491Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:06.9941624Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:06.9971821Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:35:09.3699722Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:35:09.6609397Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'd7ba35bbb649209c66e582d5a0244ba988a15159'
2025-12-04T09:35:09.7154364Z Submodule path 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:35:09.7531569Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '40626af88bd7df9a5fb80be7b25ac85b122d6c21'
2025-12-04T09:35:09.8068815Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '52eb8108c5bdec04579160ae17225d66034bd723'
2025-12-04T09:35:09.8697341Z Submodule path 'third_party/kleidiai': checked out 'd7770c89632329a9914ef1a90289917597639cbe'
2025-12-04T09:35:09.9156214Z Submodule path 'third_party/mimalloc': checked out 'fbd8b99c2b828428947d70fdc046bb55609be93e'
2025-12-04T09:35:10.0462575Z Submodule path 'third_party/nlohmann': checked out '55f93686c01528224f448c19128836e7df245f72'
2025-12-04T09:35:10.5233739Z Submodule path 'third_party/onnx': checked out 'e709452ef2bbc1d113faf678c24e6d3467696e83'
2025-12-04T09:35:10.5278188Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:10.5311011Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2025-12-04T09:35:11.4379121Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'a2e59f0e7065404b44dfe92a28aca47ba1378dc4'
2025-12-04T09:35:11.5208469Z Submodule path 'third_party/opentelemetry-cpp': checked out 'a799f4aed9c94b765dcdaabaeab7d5e7e2310878'
2025-12-04T09:35:11.5232158Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark) registered for path 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:11.5235198Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:11.5238113Z Submodule 'third_party/ms-gsl' (https://github.com/microsoft/GSL) registered for path 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:11.5241350Z Submodule 'third_party/nlohmann-json' (https://github.com/nlohmann/json) registered for path 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:11.5245009Z Submodule 'third_party/opentelemetry-proto' (https://github.com/open-telemetry/opentelemetry-proto) registered for path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:11.5248296Z Submodule 'third_party/opentracing-cpp' (https://github.com/opentracing/opentracing-cpp.git) registered for path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:11.5251773Z Submodule 'third_party/prometheus-cpp' (https://github.com/jupp0r/prometheus-cpp) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:11.5255229Z Submodule 'tools/vcpkg' (https://github.com/Microsoft/vcpkg) registered for path 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:11.5287294Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/benchmark'...
2025-12-04T09:35:11.9726414Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentracing-cpp'...
2025-12-04T09:35:11.9728645Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/opentelemetry-proto'...
2025-12-04T09:35:11.9730675Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp'...
2025-12-04T09:35:11.9732665Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/ms-gsl'...
2025-12-04T09:35:12.0727322Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/googletest'...
2025-12-04T09:35:12.7888308Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/nlohmann-json'...
2025-12-04T09:35:20.5998163Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/tools/vcpkg'...
2025-12-04T09:35:21.3395603Z Submodule path 'third_party/opentelemetry-cpp/third_party/benchmark': checked out 'd572f4777349d43653b21d6c2fc63020ab326db2'
2025-12-04T09:35:21.3870404Z Submodule path 'third_party/opentelemetry-cpp/third_party/googletest': checked out 'b796f7d44681514f58a683a3a71ff17c94edb0c1'
2025-12-04T09:35:21.4069609Z Submodule path 'third_party/opentelemetry-cpp/third_party/ms-gsl': checked out '6f4529395c5b7c2d661812257cd6780c67e54afa'
2025-12-04T09:35:21.5283154Z Submodule path 'third_party/opentelemetry-cpp/third_party/nlohmann-json': checked out 'bc889afb4c5bf1c0d8ee29ef35eaaf4c8bef8a5d'
2025-12-04T09:35:21.5447223Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto': checked out '4ca4f0335c63cda7ab31ea7ed70d6553aee14dce'
2025-12-04T09:35:21.5622771Z Submodule path 'third_party/opentelemetry-cpp/third_party/opentracing-cpp': checked out '06b57f48ded1fa3bdd3d4346f6ef29e40e08eaf5'
2025-12-04T09:35:21.5811204Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp': checked out 'c9ffcdda9086ffd9e1283ea7a0276d831f3c8a8d'
2025-12-04T09:35:21.5829539Z Submodule 'civetweb' (https://github.com/civetweb/civetweb.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:21.5832514Z Submodule 'googletest' (https://github.com/google/googletest.git) registered for path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:21.5863005Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'...
2025-12-04T09:35:23.9628125Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'...
2025-12-04T09:35:24.2534934Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb': checked out 'eefb26f82b233268fc98577d265352720d477ba4'
2025-12-04T09:35:24.3074263Z Submodule path 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2025-12-04T09:35:24.8671885Z Submodule path 'third_party/opentelemetry-cpp/tools/vcpkg': checked out '8eb57355a4ffb410a2e94c07b4dca2dffbee8e50'
2025-12-04T09:35:24.8818554Z Submodule path 'third_party/pocketfft': checked out '0fa0ef591e38c2758e3184c6c23e497b9f732ffa'
2025-12-04T09:35:25.1946633Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2025-12-04T09:35:25.1971963Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:25.1974837Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:25.2006181Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2025-12-04T09:35:25.7606332Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2025-12-04T09:35:26.2072398Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2025-12-04T09:35:26.2917370Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2025-12-04T09:35:26.3030794Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2025-12-04T09:35:26.3176732Z Submodule path 'third_party/pthreadpool': checked out '4fe0e1e183925bf8cfa6aae24237e724a96479b8'
2025-12-04T09:35:26.3673581Z Submodule path 'third_party/pybind11': checked out 'f5fbe867d2d26e4a0a9177a51f6e568868ad3dc8'
2025-12-04T09:35:26.4011347Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2025-12-04T09:35:26.4520166Z Submodule path 'third_party/sleef': checked out '5a1d179df9cf652951b59010a2d2075372d67f68'
2025-12-04T09:35:26.4854091Z Submodule path 'third_party/tensorpipe': checked out '2b4cd91092d335a697416b2a3cb398283246849d'
2025-12-04T09:35:26.4875926Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:26.4878726Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:26.4882076Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:26.4885258Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:26.4917649Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2025-12-04T09:35:27.7780474Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2025-12-04T09:35:27.7781636Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2025-12-04T09:35:27.7922466Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2025-12-04T09:35:27.8594469Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2025-12-04T09:35:27.8785707Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2025-12-04T09:35:27.9654440Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '5152db2cbfeb5582e9c27c5ea1dba2cd9e10759b'
2025-12-04T09:35:27.9997326Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2025-12-04T09:35:28.0016643Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:28.0046923Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2025-12-04T09:35:28.2178451Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2025-12-04T09:35:28.2220274Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2025-12-04T09:35:28.2561729Z Entering 'android/libs/fbjni'
2025-12-04T09:35:28.2610408Z Entering 'third_party/FP16'
2025-12-04T09:35:28.2657539Z Entering 'third_party/FXdiv'
2025-12-04T09:35:28.2705706Z Entering 'third_party/NNPACK'
2025-12-04T09:35:28.2756507Z Entering 'third_party/NVTX'
2025-12-04T09:35:28.2806939Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:28.2854837Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:28.2921237Z Entering 'third_party/aiter'
2025-12-04T09:35:28.2969635Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:28.3028406Z Entering 'third_party/benchmark'
2025-12-04T09:35:28.3076037Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:28.3134666Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:28.3182624Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:28.3231149Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:28.3281602Z Entering 'third_party/cutlass'
2025-12-04T09:35:28.3340956Z Entering 'third_party/fbgemm'
2025-12-04T09:35:28.3391538Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:28.3439037Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:28.3499619Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:28.3551451Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:28.3609975Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:28.3656612Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:28.3703608Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:28.3753734Z Entering 'third_party/flash-attention'
2025-12-04T09:35:28.3804563Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:28.3857322Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:28.3915579Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:28.3966777Z Entering 'third_party/fmt'
2025-12-04T09:35:28.4017311Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:28.4065071Z Entering 'third_party/gloo'
2025-12-04T09:35:28.4114055Z Entering 'third_party/googletest'
2025-12-04T09:35:28.4161730Z Entering 'third_party/ideep'
2025-12-04T09:35:28.4209008Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:28.4266438Z Entering 'third_party/ittapi'
2025-12-04T09:35:28.4316049Z Entering 'third_party/kineto'
2025-12-04T09:35:28.4363847Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:28.4411041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:28.4458568Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:28.4506377Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:28.4553707Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:28.4600726Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:28.4649162Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:28.4697480Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:28.4745853Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:28.4795863Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:28.4843265Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:28.4890588Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:28.4941962Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:28.4993895Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:28.5041873Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:28.5090644Z Entering 'third_party/kleidiai'
2025-12-04T09:35:28.5140682Z Entering 'third_party/mimalloc'
2025-12-04T09:35:28.5187272Z Entering 'third_party/nlohmann'
2025-12-04T09:35:28.5237072Z Entering 'third_party/onnx'
2025-12-04T09:35:28.5306785Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:28.5358384Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:28.5409819Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:28.5457242Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:28.5504831Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:28.5553276Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:28.5601771Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:28.5647537Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:28.5693320Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:28.5740678Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:28.5789139Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:28.5839005Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:28.5909249Z Entering 'third_party/pocketfft'
2025-12-04T09:35:28.5958275Z Entering 'third_party/protobuf'
2025-12-04T09:35:28.6012403Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:28.6058591Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:28.6108868Z Entering 'third_party/psimd'
2025-12-04T09:35:28.6156535Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:28.6204617Z Entering 'third_party/pybind11'
2025-12-04T09:35:28.6253039Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:28.6302530Z Entering 'third_party/sleef'
2025-12-04T09:35:28.6350690Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:28.6398365Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:28.6444259Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:28.6489665Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:28.6537403Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:28.6584181Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:28.6647201Z ##[endgroup]
2025-12-04T09:35:28.6647764Z ##[group]Persisting credentials for submodules
2025-12-04T09:35:28.6654066Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :"
2025-12-04T09:35:28.6994218Z Entering 'android/libs/fbjni'
2025-12-04T09:35:28.7060508Z Entering 'third_party/FP16'
2025-12-04T09:35:28.7125706Z Entering 'third_party/FXdiv'
2025-12-04T09:35:28.7189026Z Entering 'third_party/NNPACK'
2025-12-04T09:35:28.7251982Z Entering 'third_party/NVTX'
2025-12-04T09:35:28.7319065Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:28.7382588Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:28.7464397Z Entering 'third_party/aiter'
2025-12-04T09:35:28.7529150Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:28.7602768Z Entering 'third_party/benchmark'
2025-12-04T09:35:28.7666066Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:28.7738033Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:28.7801983Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:28.7865167Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:28.7928981Z Entering 'third_party/cutlass'
2025-12-04T09:35:28.8003688Z Entering 'third_party/fbgemm'
2025-12-04T09:35:28.8070321Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:28.8134484Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:28.8210112Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:28.8273819Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:28.8347099Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:28.8410459Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:28.8472778Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:28.8537755Z Entering 'third_party/flash-attention'
2025-12-04T09:35:28.8605258Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:28.8674871Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:28.8751146Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:28.8818734Z Entering 'third_party/fmt'
2025-12-04T09:35:28.8881608Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:28.8943967Z Entering 'third_party/gloo'
2025-12-04T09:35:28.9011908Z Entering 'third_party/googletest'
2025-12-04T09:35:28.9075687Z Entering 'third_party/ideep'
2025-12-04T09:35:28.9138205Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:28.9208965Z Entering 'third_party/ittapi'
2025-12-04T09:35:28.9272800Z Entering 'third_party/kineto'
2025-12-04T09:35:28.9340169Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:28.9402688Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:28.9465507Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:28.9530807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:28.9592680Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:28.9654572Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:28.9718906Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:28.9787182Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:28.9854604Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:28.9920523Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:28.9987369Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:29.0049029Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.0114988Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.0186304Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:29.0252704Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:29.0320687Z Entering 'third_party/kleidiai'
2025-12-04T09:35:29.0386237Z Entering 'third_party/mimalloc'
2025-12-04T09:35:29.0451577Z Entering 'third_party/nlohmann'
2025-12-04T09:35:29.0519786Z Entering 'third_party/onnx'
2025-12-04T09:35:29.0606277Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:29.0671535Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:29.0738020Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:29.0800459Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:29.0863081Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:29.0925575Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:29.0989916Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:29.1054504Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:29.1116890Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:29.1176688Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.1241317Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.1308555Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:29.1394501Z Entering 'third_party/pocketfft'
2025-12-04T09:35:29.1458373Z Entering 'third_party/protobuf'
2025-12-04T09:35:29.1527010Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:29.1589623Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:29.1656382Z Entering 'third_party/psimd'
2025-12-04T09:35:29.1720656Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:29.1785721Z Entering 'third_party/pybind11'
2025-12-04T09:35:29.1849662Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:29.1913163Z Entering 'third_party/sleef'
2025-12-04T09:35:29.1976451Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:29.2040210Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:29.2102440Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:29.2164568Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:29.2227963Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:29.2292862Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:29.2377157Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url"
2025-12-04T09:35:29.2721476Z Entering 'android/libs/fbjni'
2025-12-04T09:35:29.2779251Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T09:35:29.2798757Z Entering 'third_party/FP16'
2025-12-04T09:35:29.2857832Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T09:35:29.2876575Z Entering 'third_party/FXdiv'
2025-12-04T09:35:29.2935467Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T09:35:29.2953744Z Entering 'third_party/NNPACK'
2025-12-04T09:35:29.3013781Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T09:35:29.3032330Z Entering 'third_party/NVTX'
2025-12-04T09:35:29.3092175Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T09:35:29.3112139Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:29.3171011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T09:35:29.3189784Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:29.3248962Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T09:35:29.3283644Z Entering 'third_party/aiter'
2025-12-04T09:35:29.3341864Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T09:35:29.3361185Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:29.3422647Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T09:35:29.3452793Z Entering 'third_party/benchmark'
2025-12-04T09:35:29.3512487Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:29.3530840Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:29.3589019Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T09:35:29.3617748Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:29.3676927Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T09:35:29.3695581Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:29.3754466Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T09:35:29.3773751Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:29.3832843Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T09:35:29.3851292Z Entering 'third_party/cutlass'
2025-12-04T09:35:29.3910295Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T09:35:29.3940986Z Entering 'third_party/fbgemm'
2025-12-04T09:35:29.4001113Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T09:35:29.4021657Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:29.4080662Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T09:35:29.4098200Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:29.4157498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T09:35:29.4184664Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:29.4247995Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T09:35:29.4266487Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:29.4326552Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T09:35:29.4354311Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:29.4413495Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T09:35:29.4431228Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:29.4495754Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T09:35:29.4513228Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:29.4572813Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T09:35:29.4593739Z Entering 'third_party/flash-attention'
2025-12-04T09:35:29.4653174Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T09:35:29.4671157Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:29.4730091Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T09:35:29.4754793Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:29.4814540Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T09:35:29.4842132Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:29.4902310Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T09:35:29.4922560Z Entering 'third_party/fmt'
2025-12-04T09:35:29.4981625Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:29.5000738Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:29.5059345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T09:35:29.5077986Z Entering 'third_party/gloo'
2025-12-04T09:35:29.5136262Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T09:35:29.5155031Z Entering 'third_party/googletest'
2025-12-04T09:35:29.5213398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.5231788Z Entering 'third_party/ideep'
2025-12-04T09:35:29.5291461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T09:35:29.5309242Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:29.5366514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T09:35:29.5394230Z Entering 'third_party/ittapi'
2025-12-04T09:35:29.5454040Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T09:35:29.5472507Z Entering 'third_party/kineto'
2025-12-04T09:35:29.5533838Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T09:35:29.5552429Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:29.5613801Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T09:35:29.5631282Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:29.5691545Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T09:35:29.5711287Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:29.5771350Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T09:35:29.5791114Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:29.5851423Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:29.5869034Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:29.5928633Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T09:35:29.5945545Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:29.6005018Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T09:35:29.6024626Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:29.6084244Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T09:35:29.6103127Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:29.6161580Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.6179368Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:29.6238383Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T09:35:29.6257416Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:29.6317434Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T09:35:29.6335594Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:29.6393172Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:35:29.6410116Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.6469714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:35:29.6489942Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.6552345Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:35:29.6575689Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:29.6633714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T09:35:29.6651059Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:29.6709098Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.6728795Z Entering 'third_party/kleidiai'
2025-12-04T09:35:29.6787798Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T09:35:29.6808980Z Entering 'third_party/mimalloc'
2025-12-04T09:35:29.6867300Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T09:35:29.6886076Z Entering 'third_party/nlohmann'
2025-12-04T09:35:29.6945917Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T09:35:29.6965856Z Entering 'third_party/onnx'
2025-12-04T09:35:29.7025724Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T09:35:29.7065073Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:29.7124221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:29.7145747Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:29.7208698Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T09:35:29.7228646Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:29.7287480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:29.7307171Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:29.7366930Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.7385735Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:29.7446232Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T09:35:29.7463897Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:29.7522827Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T09:35:29.7542194Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:29.7601712Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T09:35:29.7619651Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:29.7679011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T09:35:29.7696743Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:29.7755221Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T09:35:29.7772062Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:29.7831522Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T09:35:29.7851467Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:29.7909450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T09:35:29.7929464Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:29.7992202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T09:35:29.8033017Z Entering 'third_party/pocketfft'
2025-12-04T09:35:29.8091694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T09:35:29.8111974Z Entering 'third_party/protobuf'
2025-12-04T09:35:29.8170283Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T09:35:29.8192543Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:29.8258302Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T09:35:29.8276616Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:29.8336335Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.8357366Z Entering 'third_party/psimd'
2025-12-04T09:35:29.8416581Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T09:35:29.8435665Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:29.8495881Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T09:35:29.8515391Z Entering 'third_party/pybind11'
2025-12-04T09:35:29.8573782Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:29.8592842Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:29.8654416Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T09:35:29.8673711Z Entering 'third_party/sleef'
2025-12-04T09:35:29.8735242Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T09:35:29.8754118Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:29.8815644Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T09:35:29.8833865Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:29.8892286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T09:35:29.8912181Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:29.8970830Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T09:35:29.8988005Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:29.9046504Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T09:35:29.9064515Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:29.9124214Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T09:35:29.9140527Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:29.9199990Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T09:35:30.0198560Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2025-12-04T09:35:30.0543636Z Entering 'android/libs/fbjni'
2025-12-04T09:35:30.0592669Z Entering 'third_party/FP16'
2025-12-04T09:35:30.0641199Z Entering 'third_party/FXdiv'
2025-12-04T09:35:30.0689328Z Entering 'third_party/NNPACK'
2025-12-04T09:35:30.0737123Z Entering 'third_party/NVTX'
2025-12-04T09:35:30.0785302Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:30.0832580Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:30.0896326Z Entering 'third_party/aiter'
2025-12-04T09:35:30.0945852Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:30.1002622Z Entering 'third_party/benchmark'
2025-12-04T09:35:30.1050045Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:30.1108779Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:30.1157257Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:30.1207089Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:30.1256280Z Entering 'third_party/cutlass'
2025-12-04T09:35:30.1315410Z Entering 'third_party/fbgemm'
2025-12-04T09:35:30.1367215Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:30.1416641Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:30.1473698Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:30.1524483Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:30.1579776Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:30.1625876Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:30.1671800Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:30.1722147Z Entering 'third_party/flash-attention'
2025-12-04T09:35:30.1769927Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:30.1824237Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:30.1884037Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:30.1936314Z Entering 'third_party/fmt'
2025-12-04T09:35:30.1984155Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:30.2032255Z Entering 'third_party/gloo'
2025-12-04T09:35:30.2081039Z Entering 'third_party/googletest'
2025-12-04T09:35:30.2129802Z Entering 'third_party/ideep'
2025-12-04T09:35:30.2176208Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:30.2232639Z Entering 'third_party/ittapi'
2025-12-04T09:35:30.2279580Z Entering 'third_party/kineto'
2025-12-04T09:35:30.2327079Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:30.2374058Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:30.2423418Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:30.2471014Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:30.2519632Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:30.2565649Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:30.2614439Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:30.2664819Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:30.2712221Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:30.2759986Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:30.2807306Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:30.2852807Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:30.2908324Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:30.2961401Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:30.3009119Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:30.3057804Z Entering 'third_party/kleidiai'
2025-12-04T09:35:30.3109748Z Entering 'third_party/mimalloc'
2025-12-04T09:35:30.3156917Z Entering 'third_party/nlohmann'
2025-12-04T09:35:30.3209844Z Entering 'third_party/onnx'
2025-12-04T09:35:30.3277697Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:30.3329732Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:30.3379281Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:30.3425112Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:30.3471716Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:30.3519626Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:30.3566647Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:30.3613903Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:30.3660553Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:30.3705806Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:30.3759287Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:30.3806982Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:30.3876200Z Entering 'third_party/pocketfft'
2025-12-04T09:35:30.3925496Z Entering 'third_party/protobuf'
2025-12-04T09:35:30.3977682Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:30.4025415Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:30.4074445Z Entering 'third_party/psimd'
2025-12-04T09:35:30.4122092Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:30.4169664Z Entering 'third_party/pybind11'
2025-12-04T09:35:30.4217088Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:30.4264343Z Entering 'third_party/sleef'
2025-12-04T09:35:30.4313096Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:30.4360079Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:30.4407192Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:30.4452625Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:30.4499337Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:30.4545400Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:30.4620605Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2025-12-04T09:35:30.4958060Z Entering 'android/libs/fbjni'
2025-12-04T09:35:30.5007170Z Entering 'third_party/FP16'
2025-12-04T09:35:30.5055172Z Entering 'third_party/FXdiv'
2025-12-04T09:35:30.5105120Z Entering 'third_party/NNPACK'
2025-12-04T09:35:30.5152345Z Entering 'third_party/NVTX'
2025-12-04T09:35:30.5200731Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:30.5248615Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:30.5314451Z Entering 'third_party/aiter'
2025-12-04T09:35:30.5363242Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:30.5421195Z Entering 'third_party/benchmark'
2025-12-04T09:35:30.5469226Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:30.5526340Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:30.5574138Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:30.5622593Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:30.5670851Z Entering 'third_party/cutlass'
2025-12-04T09:35:30.5728865Z Entering 'third_party/fbgemm'
2025-12-04T09:35:30.5778912Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:30.5825674Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:30.5880970Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:30.5928885Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:30.5984802Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:30.6031345Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:30.6076895Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:30.6127225Z Entering 'third_party/flash-attention'
2025-12-04T09:35:30.6176732Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:30.6232360Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:30.6291960Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:30.6345564Z Entering 'third_party/fmt'
2025-12-04T09:35:30.6394401Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:30.6445134Z Entering 'third_party/gloo'
2025-12-04T09:35:30.6496681Z Entering 'third_party/googletest'
2025-12-04T09:35:30.6546494Z Entering 'third_party/ideep'
2025-12-04T09:35:30.6594122Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:30.6649601Z Entering 'third_party/ittapi'
2025-12-04T09:35:30.6697187Z Entering 'third_party/kineto'
2025-12-04T09:35:30.6746221Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:30.6795027Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:30.6845038Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:30.6892497Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:30.6938876Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:30.6983922Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:30.7035915Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:30.7083399Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:30.7130408Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:30.7179146Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:30.7226592Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:30.7274821Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:30.7326752Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:30.7378349Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:30.7425187Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:30.7473869Z Entering 'third_party/kleidiai'
2025-12-04T09:35:30.7525226Z Entering 'third_party/mimalloc'
2025-12-04T09:35:30.7573555Z Entering 'third_party/nlohmann'
2025-12-04T09:35:30.7623080Z Entering 'third_party/onnx'
2025-12-04T09:35:30.7691373Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:30.7744735Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:30.7798102Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:30.7844883Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:30.7893271Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:30.7939629Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:30.7987740Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:30.8036234Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:30.8081829Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:30.8126859Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:30.8174638Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:30.8221826Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:30.8293413Z Entering 'third_party/pocketfft'
2025-12-04T09:35:30.8342559Z Entering 'third_party/protobuf'
2025-12-04T09:35:30.8394164Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:30.8441099Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:30.8489341Z Entering 'third_party/psimd'
2025-12-04T09:35:30.8539068Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:30.8586424Z Entering 'third_party/pybind11'
2025-12-04T09:35:30.8635065Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:30.8682645Z Entering 'third_party/sleef'
2025-12-04T09:35:30.8733949Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:30.8782766Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:30.8830051Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:30.8876606Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:30.8923582Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:30.8970389Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:30.9033941Z ##[endgroup]
2025-12-04T09:35:30.9073712Z [command]/usr/bin/git log -1 --format=%H
2025-12-04T09:35:30.9099655Z ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:35:30.9205674Z ##[group]Run cd "${GITHUB_WORKSPACE}"
2025-12-04T09:35:30.9206095Z [36;1mcd "${GITHUB_WORKSPACE}"[0m
2025-12-04T09:35:30.9206599Z [36;1m# Clean stale submodule dirs[0m
2025-12-04T09:35:30.9206977Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2025-12-04T09:35:30.9207435Z [36;1m  sudo git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:35:30.9207877Z [36;1melse[0m
2025-12-04T09:35:30.9208228Z [36;1m  git submodule foreach --recursive git clean -ffdx[0m
2025-12-04T09:35:30.9208676Z [36;1mfi[0m
2025-12-04T09:35:30.9216814Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:30.9217264Z env:
2025-12-04T09:35:30.9217526Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:30.9217817Z   NO_SUDO: true
2025-12-04T09:35:30.9218080Z ##[endgroup]
2025-12-04T09:35:30.9589531Z Entering 'android/libs/fbjni'
2025-12-04T09:35:30.9629788Z Entering 'third_party/FP16'
2025-12-04T09:35:30.9666319Z Entering 'third_party/FXdiv'
2025-12-04T09:35:30.9702461Z Entering 'third_party/NNPACK'
2025-12-04T09:35:30.9741780Z Entering 'third_party/NVTX'
2025-12-04T09:35:30.9785633Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T09:35:30.9824203Z Entering 'third_party/XNNPACK'
2025-12-04T09:35:30.9968577Z Entering 'third_party/aiter'
2025-12-04T09:35:31.0020673Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T09:35:31.0147692Z Entering 'third_party/benchmark'
2025-12-04T09:35:31.0187744Z Entering 'third_party/composable_kernel'
2025-12-04T09:35:31.0321569Z Entering 'third_party/cpp-httplib'
2025-12-04T09:35:31.0359797Z Entering 'third_party/cpuinfo'
2025-12-04T09:35:31.0400002Z Entering 'third_party/cudnn_frontend'
2025-12-04T09:35:31.0439272Z Entering 'third_party/cutlass'
2025-12-04T09:35:31.0554224Z Entering 'third_party/fbgemm'
2025-12-04T09:35:31.0622262Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T09:35:31.0656973Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T09:35:31.0789175Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T09:35:31.0828094Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T09:35:31.0938685Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T09:35:31.0976439Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T09:35:31.1009747Z Entering 'third_party/fbgemm/external/json'
2025-12-04T09:35:31.1059824Z Entering 'third_party/flash-attention'
2025-12-04T09:35:31.1105866Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T09:35:31.1218245Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T09:35:31.1320506Z Entering 'third_party/flatbuffers'
2025-12-04T09:35:31.1400661Z Entering 'third_party/fmt'
2025-12-04T09:35:31.1438227Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T09:35:31.1475206Z Entering 'third_party/gloo'
2025-12-04T09:35:31.1512955Z Entering 'third_party/googletest'
2025-12-04T09:35:31.1550835Z Entering 'third_party/ideep'
2025-12-04T09:35:31.1584596Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T09:35:31.1681987Z Entering 'third_party/ittapi'
2025-12-04T09:35:31.1721005Z Entering 'third_party/kineto'
2025-12-04T09:35:31.1760840Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T09:35:31.1801899Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T09:35:31.1851996Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T09:35:31.1887237Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T09:35:31.1922857Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T09:35:31.1955825Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T09:35:31.1991434Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T09:35:31.2029495Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T09:35:31.2066935Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T09:35:31.2118739Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T09:35:31.2153959Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T09:35:31.2190138Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:31.2246041Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:31.2290066Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T09:35:31.2325600Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T09:35:31.2366191Z Entering 'third_party/kleidiai'
2025-12-04T09:35:31.2410864Z Entering 'third_party/mimalloc'
2025-12-04T09:35:31.2448931Z Entering 'third_party/nlohmann'
2025-12-04T09:35:31.2500243Z Entering 'third_party/onnx'
2025-12-04T09:35:31.2875672Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T09:35:31.2917792Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T09:35:31.2981113Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T09:35:31.3018016Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T09:35:31.3054687Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T09:35:31.3088702Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T09:35:31.3135972Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T09:35:31.3170364Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T09:35:31.3205362Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T09:35:31.3241246Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T09:35:31.3293958Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T09:35:31.3334358Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T09:35:31.3627175Z Entering 'third_party/pocketfft'
2025-12-04T09:35:31.3662967Z Entering 'third_party/protobuf'
2025-12-04T09:35:31.3752840Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T09:35:31.3787279Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T09:35:31.3828736Z Entering 'third_party/psimd'
2025-12-04T09:35:31.3863673Z Entering 'third_party/pthreadpool'
2025-12-04T09:35:31.3901295Z Entering 'third_party/pybind11'
2025-12-04T09:35:31.3939557Z Entering 'third_party/python-peachpy'
2025-12-04T09:35:31.3976421Z Entering 'third_party/sleef'
2025-12-04T09:35:31.4016097Z Entering 'third_party/tensorpipe'
2025-12-04T09:35:31.4054271Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T09:35:31.4090762Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T09:35:31.4125102Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T09:35:31.4163519Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T09:35:31.4197172Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T09:35:31.4372753Z Prepare all required actions
2025-12-04T09:35:31.4373399Z Getting action download info
2025-12-04T09:35:31.6502455Z ##[group]Run ./.github/actions/setup-linux
2025-12-04T09:35:31.6502823Z env:
2025-12-04T09:35:31.6503077Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:31.6503372Z ##[endgroup]
2025-12-04T09:35:31.6552920Z ##[group]Run set -euo pipefail
2025-12-04T09:35:31.6553571Z [36;1mset -euo pipefail[0m
2025-12-04T09:35:31.6553999Z [36;1mfunction get_ec2_metadata() {[0m
2025-12-04T09:35:31.6554568Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2025-12-04T09:35:31.6555449Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2025-12-04T09:35:31.6556216Z [36;1m  category=$1[0m
2025-12-04T09:35:31.6556746Z [36;1m  # If it is GCP runner (runner name contains gcp), do not run this[0m
2025-12-04T09:35:31.6557465Z [36;1m  runner_name_str=i-03bbda7791efb68ed[0m
2025-12-04T09:35:31.6558075Z [36;1m  if [[ -f /.inarc ]]; then[0m
2025-12-04T09:35:31.6558559Z [36;1m    echo "ARC Runner, no info on ec2 metadata"[0m
2025-12-04T09:35:31.6559202Z [36;1m  elif [[ $runner_name_str == *"gcp"* ]]; then[0m
2025-12-04T09:35:31.6559859Z [36;1m    echo "Runner is from Google Cloud Platform, No info on ec2 metadata"[0m
2025-12-04T09:35:31.6560401Z [36;1m  else[0m
2025-12-04T09:35:31.6561598Z [36;1m    curl -H "X-aws-ec2-metadata-token: $(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 30")" -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2025-12-04T09:35:31.6562938Z [36;1m  fi[0m
2025-12-04T09:35:31.6563322Z [36;1m}[0m
2025-12-04T09:35:31.6563715Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2025-12-04T09:35:31.6564323Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2025-12-04T09:35:31.6565022Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2025-12-04T09:35:31.6565623Z [36;1mecho "system info $(uname -a)"[0m
2025-12-04T09:35:31.6573753Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:31.6574328Z env:
2025-12-04T09:35:31.6574688Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:31.6575091Z ##[endgroup]
2025-12-04T09:35:31.6738293Z ami-id: ami-08982f1c5bf93d976
2025-12-04T09:35:31.6856093Z instance-id: i-03bbda7791efb68ed
2025-12-04T09:35:31.6970651Z instance-type: g4dn.4xlarge
2025-12-04T09:35:31.6982670Z system info Linux ip-10-0-76-64.ec2.internal 6.1.150-174.273.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Sep  9 12:21:26 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
2025-12-04T09:35:31.7006707Z ##[group]Run if [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi
2025-12-04T09:35:31.7007279Z [36;1mif [ -f /usr/bin/nvidia-smi ]; then nvidia-smi; fi[0m
2025-12-04T09:35:31.7014759Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:31.7015210Z env:
2025-12-04T09:35:31.7015450Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:31.7015760Z ##[endgroup]
2025-12-04T09:35:33.0650257Z Thu Dec  4 09:35:33 2025       
2025-12-04T09:35:33.0651519Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:33.0652175Z | NVIDIA-SMI 580.82.07              Driver Version: 580.82.07      CUDA Version: 13.0     |
2025-12-04T09:35:33.0652814Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:35:33.0653455Z | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:35:33.0654132Z | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:35:33.0654685Z |                                         |                        |               MIG M. |
2025-12-04T09:35:33.0655087Z |=========================================+========================+======================|
2025-12-04T09:35:33.0750629Z |   0  Tesla T4                       Off |   00000000:00:1E.0 Off |                    0 |
2025-12-04T09:35:33.0751528Z | N/A   32C    P0             28W /   70W |       0MiB /  15360MiB |      8%      Default |
2025-12-04T09:35:33.0752013Z |                                         |                        |                  N/A |
2025-12-04T09:35:33.0752501Z +-----------------------------------------+------------------------+----------------------+
2025-12-04T09:35:33.0752905Z 
2025-12-04T09:35:33.0753129Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:33.0753674Z | Processes:                                                                              |
2025-12-04T09:35:33.0754234Z |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
2025-12-04T09:35:33.0754743Z |        ID   ID                                                               Usage      |
2025-12-04T09:35:33.0755175Z |=========================================================================================|
2025-12-04T09:35:33.0755724Z |  No running processes found                                                             |
2025-12-04T09:35:33.0756323Z +-----------------------------------------------------------------------------------------+
2025-12-04T09:35:33.4867783Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:35:33.4868919Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:35:33.4878400Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:33.4878852Z env:
2025-12-04T09:35:33.4879100Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:33.4879411Z ##[endgroup]
2025-12-04T09:35:33.4940391Z ##[group]Run if systemctl is-active --quiet docker; then
2025-12-04T09:35:33.4940929Z [36;1mif systemctl is-active --quiet docker; then[0m
2025-12-04T09:35:33.4941670Z [36;1m    echo "Docker daemon is running...";[0m
2025-12-04T09:35:33.4942090Z [36;1melse[0m
2025-12-04T09:35:33.4942521Z [36;1m    echo "Starting docker daemon..." && sudo systemctl start docker;[0m
2025-12-04T09:35:33.4943031Z [36;1mfi[0m
2025-12-04T09:35:33.4950025Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:33.4950477Z env:
2025-12-04T09:35:33.4950733Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:33.4951026Z ##[endgroup]
2025-12-04T09:35:33.5042874Z Docker daemon is running...
2025-12-04T09:35:33.5089381Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:35:33.5089724Z with:
2025-12-04T09:35:33.5089957Z   shell: bash
2025-12-04T09:35:33.5090214Z   timeout_minutes: 5
2025-12-04T09:35:33.5090497Z   max_attempts: 3
2025-12-04T09:35:33.5090759Z   retry_wait_seconds: 30
2025-12-04T09:35:33.5093518Z   command: AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"

# For LF Runners we need to make sure we also login to Meta's ECR docker registry too.
META_AWS_ACCOUNT_ID=308535385114
if [ "$AWS_ACCOUNT_ID" != "$META_AWS_ACCOUNT_ID" ] ; then
    aws ecr get-login-password --region "$AWS_DEFAULT_REGION" | docker login --username AWS \
        --password-stdin "$META_AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"
fi

2025-12-04T09:35:33.5096311Z   polling_interval_seconds: 1
2025-12-04T09:35:33.5096650Z   warning_on_retry: true
2025-12-04T09:35:33.5096962Z   continue_on_error: false
2025-12-04T09:35:33.5097244Z env:
2025-12-04T09:35:33.5097483Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:33.5097793Z   AWS_RETRY_MODE: standard
2025-12-04T09:35:33.5098083Z   AWS_MAX_ATTEMPTS: 5
2025-12-04T09:35:33.5098372Z   AWS_DEFAULT_REGION: us-east-1
2025-12-04T09:35:33.5098690Z ##[endgroup]
2025-12-04T09:35:34.8338349Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:34.8339378Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:34.8340054Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:34.8340529Z 
2025-12-04T09:35:34.8340650Z Login Succeeded
2025-12-04T09:35:35.6047537Z Command completed after 1 attempt(s).
2025-12-04T09:35:35.6103178Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:35:35.6103816Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:35:35.6104374Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:35:35.6113877Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.6114318Z env:
2025-12-04T09:35:35.6114595Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.6114890Z ##[endgroup]
2025-12-04T09:35:35.6201833Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T09:35:35.6202631Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T09:35:35.6203166Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T09:35:35.6203568Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T09:35:35.6203985Z [36;1m# Prune all of the docker images[0m
2025-12-04T09:35:35.6204363Z [36;1mdocker system prune -af[0m
2025-12-04T09:35:35.6211367Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.6211821Z env:
2025-12-04T09:35:35.6212093Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.6212390Z ##[endgroup]
2025-12-04T09:35:35.6496000Z "docker stop" requires at least 1 argument.
2025-12-04T09:35:35.6496484Z See 'docker stop --help'.
2025-12-04T09:35:35.6496703Z 
2025-12-04T09:35:35.6496893Z Usage:  docker stop [OPTIONS] CONTAINER [CONTAINER...]
2025-12-04T09:35:35.6497211Z 
2025-12-04T09:35:35.6513285Z Stop one or more running containers
2025-12-04T09:35:35.6715774Z Total reclaimed space: 0B
2025-12-04T09:35:35.6926493Z ##[group]Run pytorch/test-infra/.github/actions/calculate-docker-image@main
2025-12-04T09:35:35.6927070Z with:
2025-12-04T09:35:35.6928017Z   docker-image-name: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.6929081Z   use-custom-docker-registry: true
2025-12-04T09:35:35.6929449Z   docker-build-dir: .ci/docker
2025-12-04T09:35:35.6929797Z   docker-build-script: ./build.sh
2025-12-04T09:35:35.6930307Z   working-directory: .
2025-12-04T09:35:35.6930702Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.6931173Z   force-push: false
2025-12-04T09:35:35.6931436Z env:
2025-12-04T09:35:35.6931664Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.6931965Z ##[endgroup]
2025-12-04T09:35:35.6953083Z ##[group]Run set -ex
2025-12-04T09:35:35.6953419Z [36;1mset -ex[0m
2025-12-04T09:35:35.6953687Z [36;1m[0m
2025-12-04T09:35:35.6954282Z [36;1m# If the docker build directory or the build script doesn't exist, the action will[0m
2025-12-04T09:35:35.6955077Z [36;1m# gracefully return the docker image name as it is.  Pulling docker image in Linux[0m
2025-12-04T09:35:35.6955757Z [36;1m# job could then download the pre-built image as usual[0m
2025-12-04T09:35:35.6956573Z [36;1mif [[ -d "${DOCKER_BUILD_DIR}" ]] && [[ -f "${DOCKER_BUILD_DIR}/${DOCKER_BUILD_SCRIPT}" ]] && [[ "${USE_CUSTOM_DOCKER_REGISTRY}" == "true" ]]; then[0m
2025-12-04T09:35:35.6957324Z [36;1m  echo "skip=false" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6957724Z [36;1melse[0m
2025-12-04T09:35:35.6958033Z [36;1m  echo "skip=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6958551Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6959017Z [36;1m[0m
2025-12-04T09:35:35.6959676Z [36;1m  echo "Not using custom ECR registry.  Either it was not requested or there is no Docker build script in the ${REPO_NAME} repo..."[0m
2025-12-04T09:35:35.6960433Z [36;1m  exit 0[0m
2025-12-04T09:35:35.6960677Z [36;1mfi[0m
2025-12-04T09:35:35.6960918Z [36;1m[0m
2025-12-04T09:35:35.6961308Z [36;1mif [[ "${DOCKER_IMAGE_NAME}" == *"${DOCKER_REGISTRY}/${REPO_NAME}"* ]]; then[0m
2025-12-04T09:35:35.6962021Z [36;1m  # The docker image name already includes the ECR prefix and tag, so we can just[0m
2025-12-04T09:35:35.6962719Z [36;1m  # use it as it is, but first let's extract the tag[0m
2025-12-04T09:35:35.6963298Z [36;1m  DOCKER_TAG=$(echo "${DOCKER_IMAGE_NAME}" | awk -F '[:,]' '{print $2}')[0m
2025-12-04T09:35:35.6963909Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6964489Z [36;1m  echo "docker-image=${DOCKER_IMAGE_NAME}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6964957Z [36;1melse[0m
2025-12-04T09:35:35.6965266Z [36;1m  if [[ "${DOCKER_IMAGE_NAME}" == *:* ]]; then[0m
2025-12-04T09:35:35.6965718Z [36;1m    CUSTOM_TAG_PREFIX=${DOCKER_IMAGE_NAME#*:}[0m
2025-12-04T09:35:35.6966171Z [36;1m    DOCKER_IMAGE_NAME=${DOCKER_IMAGE_NAME%%:*}[0m
2025-12-04T09:35:35.6966567Z [36;1m  fi[0m
2025-12-04T09:35:35.6967097Z [36;1m  DOCKER_TAG=${CUSTOM_TAG_PREFIX:+${CUSTOM_TAG_PREFIX}-}$(git rev-parse HEAD:"${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:35:35.6967821Z [36;1m  echo "docker-tag=${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6968566Z [36;1m  echo "docker-image=${DOCKER_REGISTRY}/${REPO_NAME}/${DOCKER_IMAGE_NAME}:${DOCKER_TAG}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6969396Z [36;1m  echo "custom-tag-prefix=${CUSTOM_TAG_PREFIX}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.6969907Z [36;1mfi[0m
2025-12-04T09:35:35.6976902Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.6977342Z env:
2025-12-04T09:35:35.6977589Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.6977882Z   REPO_NAME: pytorch
2025-12-04T09:35:35.6978972Z   DOCKER_IMAGE_NAME: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.6980025Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:35:35.6980364Z   DOCKER_BUILD_SCRIPT: ./build.sh
2025-12-04T09:35:35.6980797Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.6981277Z   USE_CUSTOM_DOCKER_REGISTRY: true
2025-12-04T09:35:35.6981620Z   CUSTOM_TAG_PREFIX: 
2025-12-04T09:35:35.6981888Z ##[endgroup]
2025-12-04T09:35:35.7010273Z + [[ -d .ci/docker ]]
2025-12-04T09:35:35.7010597Z + [[ -f .ci/docker/./build.sh ]]
2025-12-04T09:35:35.7011098Z + [[ true == \t\r\u\e ]]
2025-12-04T09:35:35.7011384Z + echo skip=false
2025-12-04T09:35:35.7012670Z + [[ 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a == *\3\0\8\5\3\5\3\8\5\1\1\4\.\d\k\r\.\e\c\r\.\u\s\-\e\a\s\t\-\1\.\a\m\a\z\o\n\a\w\s\.\c\o\m\/\p\y\t\o\r\c\h* ]]
2025-12-04T09:35:35.7019017Z ++ echo 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7020366Z ++ awk -F '[:,]' '{print $2}'
2025-12-04T09:35:35.7044755Z + DOCKER_TAG=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7046122Z + echo docker-tag=pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7047633Z + echo docker-image=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7073484Z ##[group]Run set +e
2025-12-04T09:35:35.7073842Z [36;1mset +e[0m
2025-12-04T09:35:35.7074184Z [36;1mset -x[0m
2025-12-04T09:35:35.7074462Z [36;1m[0m
2025-12-04T09:35:35.7074709Z [36;1mlogin() {[0m
2025-12-04T09:35:35.7075260Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:35:35.7075879Z [36;1m}[0m
2025-12-04T09:35:35.7076126Z [36;1m[0m
2025-12-04T09:35:35.7076351Z [36;1mretry () {[0m
2025-12-04T09:35:35.7076665Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:35:35.7077029Z [36;1m}[0m
2025-12-04T09:35:35.7077247Z [36;1m[0m
2025-12-04T09:35:35.7077512Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:35:35.7077861Z [36;1m[0m
2025-12-04T09:35:35.7078105Z [36;1mSTART_TIME=$(date +%s)[0m
2025-12-04T09:35:35.7078430Z [36;1m# Wait up to 120 minutes[0m
2025-12-04T09:35:35.7078850Z [36;1mwhile [[ $(( $(date +%s) - 7200 )) -lt $START_TIME ]]; do[0m
2025-12-04T09:35:35.7079441Z [36;1m  # Check if image already exists, if it does then skip building it[0m
2025-12-04T09:35:35.7080010Z [36;1m  if docker manifest inspect "${DOCKER_IMAGE}"; then[0m
2025-12-04T09:35:35.7080438Z [36;1m    exit 0[0m
2025-12-04T09:35:35.7080704Z [36;1m  fi[0m
2025-12-04T09:35:35.7080935Z [36;1m[0m
2025-12-04T09:35:35.7081388Z [36;1m  # NB: This flag is used by Docker build workflow to push the image to ECR, so we can[0m
2025-12-04T09:35:35.7082172Z [36;1m  # use this to differentiate between the Docker build and regular build jobs. For the[0m
2025-12-04T09:35:35.7083092Z [36;1m  # latter, it will wait for the Docker images to become available before continuing[0m
2025-12-04T09:35:35.7083693Z [36;1m  if [ "${DOCKER_PUSH:-false}" == "true" ]; then[0m
2025-12-04T09:35:35.7084166Z [36;1m    # It's a Docker build job, let's build the image[0m
2025-12-04T09:35:35.7084573Z [36;1m    break[0m
2025-12-04T09:35:35.7084842Z [36;1m  else[0m
2025-12-04T09:35:35.7085258Z [36;1m    # It's a regular build job, wait for the image to become available[0m
2025-12-04T09:35:35.7085734Z [36;1m    sleep 300[0m
2025-12-04T09:35:35.7086015Z [36;1m  fi[0m
2025-12-04T09:35:35.7086266Z [36;1mdone[0m
2025-12-04T09:35:35.7086497Z [36;1m[0m
2025-12-04T09:35:35.7086911Z [36;1m# NB: This part requires a full checkout. Otherwise, the merge base will[0m
2025-12-04T09:35:35.7087769Z [36;1m# be empty.  The default action would be to continue rebuild the image[0m
2025-12-04T09:35:35.7088387Z [36;1mif [[ "$BASE_REVISION" = "$(git rev-parse HEAD)" ]]; then[0m
2025-12-04T09:35:35.7088911Z [36;1m  # if we're on the base branch then use the parent commit[0m
2025-12-04T09:35:35.7089394Z [36;1m  MERGE_BASE=$(git rev-parse HEAD~)[0m
2025-12-04T09:35:35.7089765Z [36;1melse[0m
2025-12-04T09:35:35.7090132Z [36;1m  # otherwise we're on a PR, so use the most recent base commit[0m
2025-12-04T09:35:35.7090700Z [36;1m  MERGE_BASE=$(git merge-base HEAD "$BASE_REVISION")[0m
2025-12-04T09:35:35.7091213Z [36;1mfi[0m
2025-12-04T09:35:35.7091439Z [36;1m[0m
2025-12-04T09:35:35.7091718Z [36;1mif [[ -z "${MERGE_BASE}" ]]; then[0m
2025-12-04T09:35:35.7092139Z [36;1m  echo "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.7092524Z [36;1m[0m
2025-12-04T09:35:35.7093060Z [36;1m  echo "Finding merge base only works with full checkout, please set fetch-depth to 0, continuing ..."[0m
2025-12-04T09:35:35.7093717Z [36;1m  exit 0[0m
2025-12-04T09:35:35.7093974Z [36;1mfi[0m
2025-12-04T09:35:35.7094196Z [36;1m[0m
2025-12-04T09:35:35.7094549Z [36;1mif ! git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}"; then[0m
2025-12-04T09:35:35.7095359Z [36;1m  echo "Directory '${DOCKER_BUILD_DIR}' not found in commit $MERGE_BASE, you should rebase onto a more recent commit"[0m
2025-12-04T09:35:35.7096054Z [36;1m  exit 1[0m
2025-12-04T09:35:35.7096293Z [36;1mfi[0m
2025-12-04T09:35:35.7096528Z [36;1m[0m
2025-12-04T09:35:35.7096947Z [36;1mPREVIOUS_DOCKER_TAG=$(git rev-parse "${MERGE_BASE}:${DOCKER_BUILD_DIR}")[0m
2025-12-04T09:35:35.7097727Z [36;1m# If no image exists but the hash is the same as the previous hash then we should error out here[0m
2025-12-04T09:35:35.7098425Z [36;1mif [[ "${PREVIOUS_DOCKER_TAG}" == "${DOCKER_TAG}" ]]; then[0m
2025-12-04T09:35:35.7099235Z [36;1m  echo "WARNING: Something has gone wrong and the previous image isn't available for the merge-base of your branch"[0m
2025-12-04T09:35:35.7100152Z [36;1m  echo "         Will re-build docker image to store in local cache, TTS may be longer"[0m
2025-12-04T09:35:35.7100676Z [36;1mfi[0m
2025-12-04T09:35:35.7101158Z [36;1m[0m
2025-12-04T09:35:35.7101462Z [36;1mecho "rebuild=true" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:35:35.7108173Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:35.7108614Z env:
2025-12-04T09:35:35.7108861Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:35.7109179Z   DOCKER_BUILD_DIR: .ci/docker
2025-12-04T09:35:35.7109570Z   BASE_REVISION: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:35:35.7110675Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7112010Z   DOCKER_TAG: pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:35.7112807Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.7113250Z   DOCKER_PUSH: 
2025-12-04T09:35:35.7113510Z ##[endgroup]
2025-12-04T09:35:35.7140821Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.7141600Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:35.7145516Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:35:35.7146932Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:36.3142029Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:36.3142792Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:36.3143466Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:36.3143923Z 
2025-12-04T09:35:36.3144040Z Login Succeeded
2025-12-04T09:35:36.3158600Z ++ date +%s
2025-12-04T09:35:36.3169423Z + START_TIME=1764840936
2025-12-04T09:35:36.3172977Z ++ date +%s
2025-12-04T09:35:36.3185325Z + [[ 1764833736 -lt 1764840936 ]]
2025-12-04T09:35:36.3186681Z + docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:36.5440470Z {
2025-12-04T09:35:36.5440866Z 	"schemaVersion": 2,
2025-12-04T09:35:36.5441396Z 	"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
2025-12-04T09:35:36.5441908Z 	"config": {
2025-12-04T09:35:36.5442362Z 		"mediaType": "application/vnd.docker.container.image.v1+json",
2025-12-04T09:35:36.5442842Z 		"size": 34787,
2025-12-04T09:35:36.5443593Z 		"digest": "sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024"
2025-12-04T09:35:36.5444142Z 	},
2025-12-04T09:35:36.5444370Z 	"layers": [
2025-12-04T09:35:36.5444599Z 		{
2025-12-04T09:35:36.5444972Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5445460Z 			"size": 30447951,
2025-12-04T09:35:36.5445967Z 			"digest": "sha256:63e5bc7682b85ae57a1221210f64d62e7a90b0a30f19af4ca734b8242ae49d63"
2025-12-04T09:35:36.5446507Z 		},
2025-12-04T09:35:36.5446723Z 		{
2025-12-04T09:35:36.5447103Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5447576Z 			"size": 1554,
2025-12-04T09:35:36.5448043Z 			"digest": "sha256:835841cca3b7e1464290cdb78e48773e03583413fbed852c3cc5165a392ea44d"
2025-12-04T09:35:36.5448593Z 		},
2025-12-04T09:35:36.5448794Z 		{
2025-12-04T09:35:36.5449273Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5449763Z 			"size": 313276213,
2025-12-04T09:35:36.5450273Z 			"digest": "sha256:1bf1bb125deaa5b8a3adf121671e87ba2fa7e229f9eb1dff7ade581cb737175a"
2025-12-04T09:35:36.5450825Z 		},
2025-12-04T09:35:36.5451041Z 		{
2025-12-04T09:35:36.5451435Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5451906Z 			"size": 787,
2025-12-04T09:35:36.5452382Z 			"digest": "sha256:b21856d1bf420da6fa8ec7331b82ab355d4f4178644e7d3a3d3d0fbc3610109a"
2025-12-04T09:35:36.5452940Z 		},
2025-12-04T09:35:36.5453142Z 		{
2025-12-04T09:35:36.5453515Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5453997Z 			"size": 106,
2025-12-04T09:35:36.5454468Z 			"digest": "sha256:848ba2c095e2b9e6acfb0ecf077adb526fb2fa82ed44cf6648ebde97f296f8ec"
2025-12-04T09:35:36.5455027Z 		},
2025-12-04T09:35:36.5455243Z 		{
2025-12-04T09:35:36.5455601Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5456082Z 			"size": 704,
2025-12-04T09:35:36.5456558Z 			"digest": "sha256:029495b23122c840ca0e52d487afa8d2c4dbf1991cd7f204ec3e434dcf947bf4"
2025-12-04T09:35:36.5457110Z 		},
2025-12-04T09:35:36.5457319Z 		{
2025-12-04T09:35:36.5457693Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5458175Z 			"size": 1216,
2025-12-04T09:35:36.5458638Z 			"digest": "sha256:073bb82063cfba4639b11fea43753dbb128f9238353189fc02d2e2aa0b2ad359"
2025-12-04T09:35:36.5459188Z 		},
2025-12-04T09:35:36.5459406Z 		{
2025-12-04T09:35:36.5459765Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5460248Z 			"size": 484,
2025-12-04T09:35:36.5460713Z 			"digest": "sha256:59b63930883363c7d2aaab27cc61555d9f3e119dc18247a8624c98ebdaa354a5"
2025-12-04T09:35:36.5461286Z 		},
2025-12-04T09:35:36.5461501Z 		{
2025-12-04T09:35:36.5461873Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5462344Z 			"size": 110362071,
2025-12-04T09:35:36.5462827Z 			"digest": "sha256:1c6177b2970db2d7743b4337c420a35f2ec79f338c30d97d534a1f0987c00913"
2025-12-04T09:35:36.5463373Z 		},
2025-12-04T09:35:36.5463589Z 		{
2025-12-04T09:35:36.5463945Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5464427Z 			"size": 4961,
2025-12-04T09:35:36.5464913Z 			"digest": "sha256:fabe466dd5f33c3209a56abf5cb46b9b07fe21c57fb43b98e13308c8665c0864"
2025-12-04T09:35:36.5465456Z 		},
2025-12-04T09:35:36.5465675Z 		{
2025-12-04T09:35:36.5466226Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5466699Z 			"size": 1755,
2025-12-04T09:35:36.5467173Z 			"digest": "sha256:2b5a11b41761d8ea3b829e4772e4064cb6c4e4989126af324d0057661e4493a1"
2025-12-04T09:35:36.5467719Z 		},
2025-12-04T09:35:36.5467923Z 		{
2025-12-04T09:35:36.5468304Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5468784Z 			"size": 724,
2025-12-04T09:35:36.5469243Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:36.5469841Z 		},
2025-12-04T09:35:36.5470055Z 		{
2025-12-04T09:35:36.5470423Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5470891Z 			"size": 544,
2025-12-04T09:35:36.5471357Z 			"digest": "sha256:dc0780902fca810498f16efa71f8e5990385f141a0cfcc552616a4acc434f79a"
2025-12-04T09:35:36.5471905Z 		},
2025-12-04T09:35:36.5472105Z 		{
2025-12-04T09:35:36.5472483Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5472969Z 			"size": 3185191720,
2025-12-04T09:35:36.5473454Z 			"digest": "sha256:5b09a2b135c8e540e2b9374b68991afdd63a5dfaba75fb44efe054a591f400c2"
2025-12-04T09:35:36.5474006Z 		},
2025-12-04T09:35:36.5474220Z 		{
2025-12-04T09:35:36.5474580Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5475062Z 			"size": 32,
2025-12-04T09:35:36.5475538Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5476091Z 		},
2025-12-04T09:35:36.5476295Z 		{
2025-12-04T09:35:36.5476671Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5477150Z 			"size": 396,
2025-12-04T09:35:36.5477623Z 			"digest": "sha256:5bfdaeb5578d6ffcd7db29c48303cbceb13c591210feaa216a8daa7a6d445b4b"
2025-12-04T09:35:36.5478184Z 		},
2025-12-04T09:35:36.5478396Z 		{
2025-12-04T09:35:36.5478754Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5479242Z 			"size": 236865,
2025-12-04T09:35:36.5479712Z 			"digest": "sha256:0ef42867f370b8a14b8c301388793b78a0bd2533bb2a317b129b03c8667dc767"
2025-12-04T09:35:36.5480271Z 		},
2025-12-04T09:35:36.5480475Z 		{
2025-12-04T09:35:36.5480851Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5481338Z 			"size": 230,
2025-12-04T09:35:36.5481792Z 			"digest": "sha256:446083e497f322789c2d87933a77fb2dfd94e18d2e85f6d4362e6e9521b82c4e"
2025-12-04T09:35:36.5482461Z 		},
2025-12-04T09:35:36.5482684Z 		{
2025-12-04T09:35:36.5483050Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5483541Z 			"size": 3043500,
2025-12-04T09:35:36.5484030Z 			"digest": "sha256:d8a170bef0f4e0e28f5ba0952320dd465552adf74f0864b4f47cc11f4c4f82f7"
2025-12-04T09:35:36.5484589Z 		},
2025-12-04T09:35:36.5484793Z 		{
2025-12-04T09:35:36.5485170Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5485655Z 			"size": 1472,
2025-12-04T09:35:36.5486132Z 			"digest": "sha256:e2b6cd6a5bd0418a1e4aca3f37942324d4d9f9b0177597e37fc8d1a5626048e1"
2025-12-04T09:35:36.5486689Z 		},
2025-12-04T09:35:36.5486909Z 		{
2025-12-04T09:35:36.5487268Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5487745Z 			"size": 481,
2025-12-04T09:35:36.5488213Z 			"digest": "sha256:93efc0181a22218a544413f1d57e9e0e7a0f492e41bef598084c5b9177e3987a"
2025-12-04T09:35:36.5488746Z 		},
2025-12-04T09:35:36.5488961Z 		{
2025-12-04T09:35:36.5489339Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5489817Z 			"size": 202,
2025-12-04T09:35:36.5490286Z 			"digest": "sha256:7454c938f17425bcf167ad28a62b42b95f638a7d2cf0840885cfe5ffe8480a12"
2025-12-04T09:35:36.5490829Z 		},
2025-12-04T09:35:36.5491039Z 		{
2025-12-04T09:35:36.5491396Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5491874Z 			"size": 607,
2025-12-04T09:35:36.5492435Z 			"digest": "sha256:4d57ff55f6d4161cb6c29e2c0b08d47e65898427db3938479158684899f0023d"
2025-12-04T09:35:36.5492968Z 		},
2025-12-04T09:35:36.5493184Z 		{
2025-12-04T09:35:36.5493554Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5494028Z 			"size": 6243016141,
2025-12-04T09:35:36.5494523Z 			"digest": "sha256:b0301534b4a58072d5b140b08a7608bbead41d126fa29fdc78c1e8a43ebb865d"
2025-12-04T09:35:36.5495070Z 		},
2025-12-04T09:35:36.5495272Z 		{
2025-12-04T09:35:36.5495645Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5496228Z 			"size": 829,
2025-12-04T09:35:36.5496699Z 			"digest": "sha256:1969e15d0c13874ea5883ed829235a19ef6dc21c8aa6172032b78a8ffa6ff262"
2025-12-04T09:35:36.5497232Z 		},
2025-12-04T09:35:36.5497445Z 		{
2025-12-04T09:35:36.5497814Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5498285Z 			"size": 33450177,
2025-12-04T09:35:36.5498784Z 			"digest": "sha256:73180a0f2d5a961a0cc0ba2c3cf375fdcfb43ae5e4e5c63a000c4b4366d52a64"
2025-12-04T09:35:36.5499338Z 		},
2025-12-04T09:35:36.5499573Z 		{
2025-12-04T09:35:36.5499950Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5500434Z 			"size": 104,
2025-12-04T09:35:36.5501145Z 			"digest": "sha256:ad81b25cb69f8cf42a4a96678a64b7d0598a8f95236a3e63d1fec4e53edff613"
2025-12-04T09:35:36.5501717Z 		},
2025-12-04T09:35:36.5501937Z 		{
2025-12-04T09:35:36.5502301Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5502785Z 			"size": 1496,
2025-12-04T09:35:36.5503263Z 			"digest": "sha256:8165374f8dccf88a7791a5d31afbe29e4d4542b4f1cf1904945e07f9af6bf8ba"
2025-12-04T09:35:36.5503817Z 		},
2025-12-04T09:35:36.5504018Z 		{
2025-12-04T09:35:36.5504389Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5504869Z 			"size": 458786969,
2025-12-04T09:35:36.5505353Z 			"digest": "sha256:7779c0bb9be2030df9060b526b98d0afeed1ce5b61ee0530321ef04a4e145e8c"
2025-12-04T09:35:36.5505909Z 		},
2025-12-04T09:35:36.5506123Z 		{
2025-12-04T09:35:36.5506479Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5506960Z 			"size": 164,
2025-12-04T09:35:36.5507427Z 			"digest": "sha256:4d0a1c027262ed8c83181b931b64afa1c41c3cac97580231c4cae3a524ebd7d5"
2025-12-04T09:35:36.5507960Z 		},
2025-12-04T09:35:36.5508174Z 		{
2025-12-04T09:35:36.5508551Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5509021Z 			"size": 346,
2025-12-04T09:35:36.5509486Z 			"digest": "sha256:a51e0dab2d596e6563483f27c12660007160847d177ba4c31812a8f44ada5754"
2025-12-04T09:35:36.5510022Z 		},
2025-12-04T09:35:36.5510236Z 		{
2025-12-04T09:35:36.5510595Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5511071Z 			"size": 32,
2025-12-04T09:35:36.5511542Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5512080Z 		},
2025-12-04T09:35:36.5512298Z 		{
2025-12-04T09:35:36.5512670Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5513135Z 			"size": 106,
2025-12-04T09:35:36.5513617Z 			"digest": "sha256:3eb6d4ff040b8761b1e3e1da768bdb884ce0e5324e3d0f6471b0a8b2ddf4736f"
2025-12-04T09:35:36.5514173Z 		},
2025-12-04T09:35:36.5514374Z 		{
2025-12-04T09:35:36.5514746Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5515227Z 			"size": 424,
2025-12-04T09:35:36.5515700Z 			"digest": "sha256:b168858b85373f8ddca549d79267a06de4fa945d04bf791c55c9ddc93957fa3c"
2025-12-04T09:35:36.5516245Z 		},
2025-12-04T09:35:36.5516463Z 		{
2025-12-04T09:35:36.5516839Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5517310Z 			"size": 19309367,
2025-12-04T09:35:36.5517795Z 			"digest": "sha256:d77a39278026a8899e2f97643918bdcf96e711ca26951880b4841b319dc71321"
2025-12-04T09:35:36.5518336Z 		},
2025-12-04T09:35:36.5518539Z 		{
2025-12-04T09:35:36.5519082Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5519574Z 			"size": 108,
2025-12-04T09:35:36.5520052Z 			"digest": "sha256:36fbd357280b6b40e90f36ac3d19da3da10e5dbf0027a5cfe8e2f29d1870d347"
2025-12-04T09:35:36.5520618Z 		},
2025-12-04T09:35:36.5520837Z 		{
2025-12-04T09:35:36.5521203Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5521689Z 			"size": 826,
2025-12-04T09:35:36.5522165Z 			"digest": "sha256:4e3b10a5dd6aed29f238d604925e2a4f873141c1087c8dd4fdde5c61e7560893"
2025-12-04T09:35:36.5522928Z 		},
2025-12-04T09:35:36.5523134Z 		{
2025-12-04T09:35:36.5523511Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5524001Z 			"size": 724,
2025-12-04T09:35:36.5524460Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:36.5525011Z 		},
2025-12-04T09:35:36.5525230Z 		{
2025-12-04T09:35:36.5525600Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5526094Z 			"size": 149,
2025-12-04T09:35:36.5526570Z 			"digest": "sha256:3092fab73b59190b9facfc49bf18f58612172bc2fd68dfa339a1118632616939"
2025-12-04T09:35:36.5527110Z 		},
2025-12-04T09:35:36.5527330Z 		{
2025-12-04T09:35:36.5527711Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5528176Z 			"size": 136,
2025-12-04T09:35:36.5528660Z 			"digest": "sha256:20020dd28a15ba092fcbfe906ee39cdddfcc9d0b7eb42fdd6f4c08a984fa9c00"
2025-12-04T09:35:36.5529223Z 		},
2025-12-04T09:35:36.5529440Z 		{
2025-12-04T09:35:36.5529800Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5530281Z 			"size": 140,
2025-12-04T09:35:36.5530755Z 			"digest": "sha256:ae5280ce969dcff08c091e9a5f7641f13561b2b0ee44d78b7c3f81d8fe8e6d32"
2025-12-04T09:35:36.5531298Z 		},
2025-12-04T09:35:36.5531512Z 		{
2025-12-04T09:35:36.5531882Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5532356Z 			"size": 32,
2025-12-04T09:35:36.5532832Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5533388Z 		},
2025-12-04T09:35:36.5533595Z 		{
2025-12-04T09:35:36.5533963Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5534445Z 			"size": 223,
2025-12-04T09:35:36.5534915Z 			"digest": "sha256:026e4484b749dfc556dcf7c8f45c1759518a89072e4dbc974d9405ada1582d03"
2025-12-04T09:35:36.5535454Z 		},
2025-12-04T09:35:36.5535672Z 		{
2025-12-04T09:35:36.5536053Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5536524Z 			"size": 256,
2025-12-04T09:35:36.5537015Z 			"digest": "sha256:1be9da2ce53d20d8befad5c024ee0eb41ee35984307cbd5621d8effae0353073"
2025-12-04T09:35:36.5537575Z 		},
2025-12-04T09:35:36.5537780Z 		{
2025-12-04T09:35:36.5538153Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5538632Z 			"size": 32,
2025-12-04T09:35:36.5539093Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5539647Z 		},
2025-12-04T09:35:36.5539860Z 		{
2025-12-04T09:35:36.5540222Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5540705Z 			"size": 106,
2025-12-04T09:35:36.5541172Z 			"digest": "sha256:6481b7a1d9fb4001fd6f9e2a8d1600192529ddb957128e41671ca4630fa06ad4"
2025-12-04T09:35:36.5541717Z 		},
2025-12-04T09:35:36.5541920Z 		{
2025-12-04T09:35:36.5542294Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5542784Z 			"size": 312293471,
2025-12-04T09:35:36.5543274Z 			"digest": "sha256:fa519d18c39d8f297109c056017ebce7efc322d058afd27fdac5880d6c8d35b0"
2025-12-04T09:35:36.5543825Z 		},
2025-12-04T09:35:36.5544038Z 		{
2025-12-04T09:35:36.5544400Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5544885Z 			"size": 3058012325,
2025-12-04T09:35:36.5545485Z 			"digest": "sha256:d172f25b97f78fce0f6c6701f0db794b1c994a9cdf8cff9ddc6bdd1a1bea835c"
2025-12-04T09:35:36.5546039Z 		},
2025-12-04T09:35:36.5546254Z 		{
2025-12-04T09:35:36.5546633Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5547107Z 			"size": 129,
2025-12-04T09:35:36.5547582Z 			"digest": "sha256:fd60ab6b1c2c85a932e9894b5d0cf5c9e75fa21782e3028ea40d76017ecfbf85"
2025-12-04T09:35:36.5548133Z 		},
2025-12-04T09:35:36.5548345Z 		{
2025-12-04T09:35:36.5548704Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5549256Z 			"size": 880,
2025-12-04T09:35:36.5549730Z 			"digest": "sha256:0afe45579c2c87002db8c1abf7b32a748e6cb3b9b57e9b391f91cad9f84df476"
2025-12-04T09:35:36.5550271Z 		},
2025-12-04T09:35:36.5550481Z 		{
2025-12-04T09:35:36.5550852Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5551320Z 			"size": 724,
2025-12-04T09:35:36.5551787Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:36.5552325Z 		},
2025-12-04T09:35:36.5552527Z 		{
2025-12-04T09:35:36.5552899Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5553387Z 			"size": 139,
2025-12-04T09:35:36.5553854Z 			"digest": "sha256:5884ffd6720b47274f651262d5f9224f55960f9ea717faafe332aa20afb0ffa4"
2025-12-04T09:35:36.5554385Z 		},
2025-12-04T09:35:36.5554608Z 		{
2025-12-04T09:35:36.5554982Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5555454Z 			"size": 32,
2025-12-04T09:35:36.5555934Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5556489Z 		},
2025-12-04T09:35:36.5556691Z 		{
2025-12-04T09:35:36.5557064Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5557549Z 			"size": 160,
2025-12-04T09:35:36.5558020Z 			"digest": "sha256:ab7a7c316fa7a9b7a96304ce96fafdffbc5cc6b960a4bb2def9131b36d9225c5"
2025-12-04T09:35:36.5558589Z 		},
2025-12-04T09:35:36.5558802Z 		{
2025-12-04T09:35:36.5559160Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5559642Z 			"size": 1012,
2025-12-04T09:35:36.5560131Z 			"digest": "sha256:c7775ce5574bdde75b4c09a1db19f7d0dc027f1f4c1f961022fc55833133e616"
2025-12-04T09:35:36.5560685Z 		},
2025-12-04T09:35:36.5560889Z 		{
2025-12-04T09:35:36.5561264Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5561749Z 			"size": 724,
2025-12-04T09:35:36.5562201Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:36.5562865Z 		},
2025-12-04T09:35:36.5563084Z 		{
2025-12-04T09:35:36.5563451Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5563948Z 			"size": 134,
2025-12-04T09:35:36.5564427Z 			"digest": "sha256:81945c4fb228ca73f4bac38b6d8a1eca7139585d4a078219dfaa16ea13945949"
2025-12-04T09:35:36.5564978Z 		},
2025-12-04T09:35:36.5565198Z 		{
2025-12-04T09:35:36.5565581Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5566058Z 			"size": 32,
2025-12-04T09:35:36.5566538Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5567091Z 		},
2025-12-04T09:35:36.5567306Z 		{
2025-12-04T09:35:36.5567667Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5568146Z 			"size": 158,
2025-12-04T09:35:36.5568617Z 			"digest": "sha256:663cbe24d60bf42bc7a440cb4867e4287cacf54194dd3152406668e61d7e92e5"
2025-12-04T09:35:36.5569162Z 		},
2025-12-04T09:35:36.5569378Z 		{
2025-12-04T09:35:36.5569752Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5570219Z 			"size": 603,
2025-12-04T09:35:36.5570675Z 			"digest": "sha256:43f216b027865c8ca16f855703465445f3a548614a4d7e29387337b9651ac25c"
2025-12-04T09:35:36.5571206Z 		},
2025-12-04T09:35:36.5571405Z 		{
2025-12-04T09:35:36.5571880Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5572363Z 			"size": 724,
2025-12-04T09:35:36.5572828Z 			"digest": "sha256:9681563a88ff9e62494a2740e537440d3df978d466c9478d6a941fae8b57b084"
2025-12-04T09:35:36.5573356Z 		},
2025-12-04T09:35:36.5573567Z 		{
2025-12-04T09:35:36.5573944Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5574411Z 			"size": 155,
2025-12-04T09:35:36.5574889Z 			"digest": "sha256:c47c3cfeb68763aa19727693ad52fe0c80561a98139adaa2ab5eccea35c2d1b4"
2025-12-04T09:35:36.5575511Z 		},
2025-12-04T09:35:36.5575710Z 		{
2025-12-04T09:35:36.5576086Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5576570Z 			"size": 32,
2025-12-04T09:35:36.5577028Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5577583Z 		},
2025-12-04T09:35:36.5577796Z 		{
2025-12-04T09:35:36.5578155Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5578641Z 			"size": 188,
2025-12-04T09:35:36.5579111Z 			"digest": "sha256:7d326b9e267322de9337ac2a71ddeac4cb61f28a018a6155863f83a164ad9437"
2025-12-04T09:35:36.5579655Z 		},
2025-12-04T09:35:36.5579854Z 		{
2025-12-04T09:35:36.5580227Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5580714Z 			"size": 1370,
2025-12-04T09:35:36.5581181Z 			"digest": "sha256:7ec8f17141c8335192fa21b660dfe1fe0ad16b202bc234e7d4ef063b35124158"
2025-12-04T09:35:36.5581732Z 		},
2025-12-04T09:35:36.5581945Z 		{
2025-12-04T09:35:36.5582315Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5582797Z 			"size": 32,
2025-12-04T09:35:36.5583267Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5583807Z 		},
2025-12-04T09:35:36.5584021Z 		{
2025-12-04T09:35:36.5584390Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5584857Z 			"size": 136,
2025-12-04T09:35:36.5585332Z 			"digest": "sha256:26249ea175bf816b87c4c83e5efb78fd386a800fa10e819ba85b06858bcf877e"
2025-12-04T09:35:36.5585877Z 		},
2025-12-04T09:35:36.5586090Z 		{
2025-12-04T09:35:36.5586454Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5586938Z 			"size": 529,
2025-12-04T09:35:36.5587408Z 			"digest": "sha256:5e8e9ccb36f30a8c3a7e6a5011ee5001152f36c9c749397f3e234b1822326dd0"
2025-12-04T09:35:36.5587947Z 		},
2025-12-04T09:35:36.5588161Z 		{
2025-12-04T09:35:36.5588533Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5589007Z 			"size": 32,
2025-12-04T09:35:36.5589479Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5590033Z 		},
2025-12-04T09:35:36.5590231Z 		{
2025-12-04T09:35:36.5590598Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5591076Z 			"size": 104,
2025-12-04T09:35:36.5591548Z 			"digest": "sha256:5bc72d4e1de83a1a254e8808f727118dd54cf048c14ff298a5299e015a116bfd"
2025-12-04T09:35:36.5592083Z 		},
2025-12-04T09:35:36.5592297Z 		{
2025-12-04T09:35:36.5592669Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5593139Z 			"size": 436,
2025-12-04T09:35:36.5593609Z 			"digest": "sha256:83cddbd497794c27254e11c4c00105d1f61399e7fef9d208a0be250724efd2c0"
2025-12-04T09:35:36.5594160Z 		},
2025-12-04T09:35:36.5594363Z 		{
2025-12-04T09:35:36.5594740Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5595236Z 			"size": 32,
2025-12-04T09:35:36.5595697Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5596256Z 		},
2025-12-04T09:35:36.5596470Z 		{
2025-12-04T09:35:36.5596829Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5597320Z 			"size": 109,
2025-12-04T09:35:36.5613709Z 			"digest": "sha256:60c25d8c3dd2d78785f659204d0b1e64954ca581f89874b68ffe8fee23c6b661"
2025-12-04T09:35:36.5614292Z 		},
2025-12-04T09:35:36.5614518Z 		{
2025-12-04T09:35:36.5614905Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5615387Z 			"size": 1896,
2025-12-04T09:35:36.5615883Z 			"digest": "sha256:a534dcf4b9a9e5fabed742c8a8fc43c9cfe7346ea88ab3c177c3b14fd3afe00a"
2025-12-04T09:35:36.5616451Z 		},
2025-12-04T09:35:36.5616657Z 		{
2025-12-04T09:35:36.5617034Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5617521Z 			"size": 245582017,
2025-12-04T09:35:36.5618117Z 			"digest": "sha256:10138310c65c78d7de8375225ce37f5f7bfae7898e4e8bbcb90bd56a1bd05db4"
2025-12-04T09:35:36.5618720Z 		},
2025-12-04T09:35:36.5618936Z 		{
2025-12-04T09:35:36.5619311Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5619775Z 			"size": 106,
2025-12-04T09:35:36.5620252Z 			"digest": "sha256:8487679f252b6fb703dc9398d73aaeec68df724bfc961579ec5bdae62ebe3a37"
2025-12-04T09:35:36.5620809Z 		},
2025-12-04T09:35:36.5621011Z 		{
2025-12-04T09:35:36.5621386Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5621867Z 			"size": 162,
2025-12-04T09:35:36.5622330Z 			"digest": "sha256:52580ee2caa9ab69b0ac640315ee350e847cd0955c0a1eafa933a076669e87ad"
2025-12-04T09:35:36.5622881Z 		},
2025-12-04T09:35:36.5623095Z 		{
2025-12-04T09:35:36.5623453Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5623935Z 			"size": 7944,
2025-12-04T09:35:36.5624421Z 			"digest": "sha256:741c215cb2ffb295ab6a07fab3f0dfdde029463779ff9c0bbff4add26a340cfb"
2025-12-04T09:35:36.5624984Z 		},
2025-12-04T09:35:36.5625187Z 		{
2025-12-04T09:35:36.5625557Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5626038Z 			"size": 8070,
2025-12-04T09:35:36.5626489Z 			"digest": "sha256:d17f5aba17a608d1c7851cb3940a25d43f063385813051127074f693d0ede19b"
2025-12-04T09:35:36.5627032Z 		},
2025-12-04T09:35:36.5627247Z 		{
2025-12-04T09:35:36.5627613Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5628098Z 			"size": 304,
2025-12-04T09:35:36.5628581Z 			"digest": "sha256:bc08246bb4ba18c3ec5bc69e16b6b4e929c5bd0f3fae10eeb0b1a622a63d6fa2"
2025-12-04T09:35:36.5629133Z 		},
2025-12-04T09:35:36.5629346Z 		{
2025-12-04T09:35:36.5629719Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5630190Z 			"size": 23755574,
2025-12-04T09:35:36.5630679Z 			"digest": "sha256:7323bf084bf98f915db061b178c56525a0f95bd34d211b381c7527ad242c5a58"
2025-12-04T09:35:36.5631228Z 		},
2025-12-04T09:35:36.5631438Z 		{
2025-12-04T09:35:36.5631794Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5632274Z 			"size": 108,
2025-12-04T09:35:36.5632758Z 			"digest": "sha256:d344ecc97fd77c7d12fd68ddb67aeb6cc3dd2e723de5ad1ca2c80b45c8d6bd77"
2025-12-04T09:35:36.5633310Z 		},
2025-12-04T09:35:36.5633522Z 		{
2025-12-04T09:35:36.5633900Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5634376Z 			"size": 54145663,
2025-12-04T09:35:36.5634864Z 			"digest": "sha256:fb60b2d2147ff57c218f449f5b680132af8f7f8032ed69f422b48a3c3c1424f4"
2025-12-04T09:35:36.5635412Z 		},
2025-12-04T09:35:36.5635613Z 		{
2025-12-04T09:35:36.5635984Z 			"mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
2025-12-04T09:35:36.5636463Z 			"size": 32,
2025-12-04T09:35:36.5636942Z 			"digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
2025-12-04T09:35:36.5637489Z 		}
2025-12-04T09:35:36.5637706Z 	]
2025-12-04T09:35:36.5637919Z }
2025-12-04T09:35:36.5638160Z + exit 0
2025-12-04T09:35:36.5668436Z ##[group]Run set -eux
2025-12-04T09:35:36.5668766Z [36;1mset -eux[0m
2025-12-04T09:35:36.5669264Z [36;1m# It's ok if this steps fails, it would then be an anonymous user like what we used to have[0m
2025-12-04T09:35:36.5670760Z [36;1maws secretsmanager get-secret-value --secret-id docker_hub_readonly_token | jq --raw-output '.SecretString' | jq -r .docker_hub_readonly_token | docker login --username pytorchbot --password-stdin || true[0m
2025-12-04T09:35:36.5679091Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:36.5679530Z env:
2025-12-04T09:35:36.5679783Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:36.5680072Z ##[endgroup]
2025-12-04T09:35:36.5712030Z + aws secretsmanager get-secret-value --secret-id docker_hub_readonly_token
2025-12-04T09:35:36.5712779Z + jq --raw-output .SecretString
2025-12-04T09:35:36.5714179Z + jq -r .docker_hub_readonly_token
2025-12-04T09:35:36.5715128Z + docker login --username pytorchbot --password-stdin
2025-12-04T09:35:37.2191286Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:37.2192017Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:37.2192691Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:37.2193533Z 
2025-12-04T09:35:37.2193778Z Login Succeeded
2025-12-04T09:35:37.2284836Z ##[group]Run tag=${ECR_DOCKER_IMAGE##*:}
2025-12-04T09:35:37.2285280Z [36;1mtag=${ECR_DOCKER_IMAGE##*:}[0m
2025-12-04T09:35:37.2285752Z [36;1mecho "docker pull ghcr.io/pytorch/ci-image:${tag/:/-}"[0m
2025-12-04T09:35:37.2292535Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:37.2292979Z env:
2025-12-04T09:35:37.2293232Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:37.2294217Z   ECR_DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.2295239Z ##[endgroup]
2025-12-04T09:35:37.2325610Z docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.2376612Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2025-12-04T09:35:37.2377138Z with:
2025-12-04T09:35:37.2378045Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.2379187Z   docker-registry: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:37.2379648Z env:
2025-12-04T09:35:37.2379892Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:37.2380183Z ##[endgroup]
2025-12-04T09:35:37.2396588Z ##[group]Run set -x
2025-12-04T09:35:37.2396909Z [36;1mset -x[0m
2025-12-04T09:35:37.2397161Z [36;1mset +e[0m
2025-12-04T09:35:37.2397420Z [36;1m[0m
2025-12-04T09:35:37.2397683Z [36;1mlogin() {[0m
2025-12-04T09:35:37.2398231Z [36;1m  aws ecr get-login-password --region us-east-1 | docker login -u AWS --password-stdin "$1"[0m
2025-12-04T09:35:37.2398847Z [36;1m}[0m
2025-12-04T09:35:37.2399087Z [36;1m[0m
2025-12-04T09:35:37.2399364Z [36;1mretry () {[0m
2025-12-04T09:35:37.2399670Z [36;1m  $*  || (sleep 1 && $*) || (sleep 2 && $*)[0m
2025-12-04T09:35:37.2400037Z [36;1m}[0m
2025-12-04T09:35:37.2400273Z [36;1m[0m
2025-12-04T09:35:37.2400541Z [36;1mretry login "${DOCKER_REGISTRY}"[0m
2025-12-04T09:35:37.2401127Z [36;1m[0m
2025-12-04T09:35:37.2401709Z [36;1mIMAGE_SIZE=$(docker manifest inspect "${DOCKER_IMAGE}" | jq '[.layers[].size, .config.size] | add / 1024 / 1024')[0m
2025-12-04T09:35:37.2402588Z [36;1mecho "Compressed size of image in MB: ${IMAGE_SIZE}"[0m
2025-12-04T09:35:37.2403015Z [36;1m[0m
2025-12-04T09:35:37.2403260Z [36;1mset -e[0m
2025-12-04T09:35:37.2403661Z [36;1m# ignore output since only exit code is used for conditional[0m
2025-12-04T09:35:37.2404261Z [36;1m# only pull docker image if it's not available locally[0m
2025-12-04T09:35:37.2404896Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2025-12-04T09:35:37.2405502Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2025-12-04T09:35:37.2405875Z [36;1mfi[0m
2025-12-04T09:35:37.2412175Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:35:37.2412621Z env:
2025-12-04T09:35:37.2412869Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:35:37.2413832Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.2414970Z   DOCKER_REGISTRY: 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:37.2415426Z ##[endgroup]
2025-12-04T09:35:37.2441727Z + set +e
2025-12-04T09:35:37.2442429Z + retry login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:37.2443205Z + login 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:37.2445817Z + aws ecr get-login-password --region us-east-1
2025-12-04T09:35:37.2447240Z + docker login -u AWS --password-stdin 308535385114.dkr.ecr.us-east-1.amazonaws.com
2025-12-04T09:35:37.8551727Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2025-12-04T09:35:37.8552442Z Configure a credential helper to remove this warning. See
2025-12-04T09:35:37.8553461Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2025-12-04T09:35:37.8554074Z 
2025-12-04T09:35:37.8554208Z Login Succeeded
2025-12-04T09:35:37.8574384Z ++ docker manifest inspect 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:37.8575529Z ++ jq '[.layers[].size, .config.size] | add / 1024 / 1024'
2025-12-04T09:35:38.0720673Z + IMAGE_SIZE=13438.219573020935
2025-12-04T09:35:38.0721528Z + echo 'Compressed size of image in MB: 13438.219573020935'
2025-12-04T09:35:38.0722018Z + set -e
2025-12-04T09:35:38.0723128Z + docker inspect --type=image 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:38.0724639Z Compressed size of image in MB: 13438.219573020935
2025-12-04T09:35:38.0845114Z + retry docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:38.3318710Z + docker pull 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:35:38.3320950Z pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a: Pulling from pytorch/ci-image
2025-12-04T09:35:38.3322488Z 63e5bc7682b8: Pulling fs layer
2025-12-04T09:35:38.3322917Z 835841cca3b7: Pulling fs layer
2025-12-04T09:35:38.3323385Z 1bf1bb125dea: Pulling fs layer
2025-12-04T09:35:38.3323733Z b21856d1bf42: Pulling fs layer
2025-12-04T09:35:38.3324297Z 848ba2c095e2: Pulling fs layer
2025-12-04T09:35:38.3324736Z 029495b23122: Pulling fs layer
2025-12-04T09:35:38.3325109Z 073bb82063cf: Pulling fs layer
2025-12-04T09:35:38.3325452Z 59b639308833: Pulling fs layer
2025-12-04T09:35:38.3325770Z 1c6177b2970d: Pulling fs layer
2025-12-04T09:35:38.3326079Z fabe466dd5f3: Pulling fs layer
2025-12-04T09:35:38.3326472Z 2b5a11b41761: Pulling fs layer
2025-12-04T09:35:38.3326907Z 9681563a88ff: Pulling fs layer
2025-12-04T09:35:38.3327229Z dc0780902fca: Pulling fs layer
2025-12-04T09:35:38.3327716Z 5b09a2b135c8: Pulling fs layer
2025-12-04T09:35:38.3328143Z 4f4fb700ef54: Pulling fs layer
2025-12-04T09:35:38.3328492Z 5bfdaeb5578d: Pulling fs layer
2025-12-04T09:35:38.3328930Z 848ba2c095e2: Waiting
2025-12-04T09:35:38.3329276Z 0ef42867f370: Pulling fs layer
2025-12-04T09:35:38.3329662Z 446083e497f3: Pulling fs layer
2025-12-04T09:35:38.3330077Z d8a170bef0f4: Pulling fs layer
2025-12-04T09:35:38.3330472Z e2b6cd6a5bd0: Pulling fs layer
2025-12-04T09:35:38.3330860Z 93efc0181a22: Pulling fs layer
2025-12-04T09:35:38.3331167Z 7454c938f174: Pulling fs layer
2025-12-04T09:35:38.3331553Z 4d57ff55f6d4: Pulling fs layer
2025-12-04T09:35:38.3331873Z 5b09a2b135c8: Waiting
2025-12-04T09:35:38.3332134Z 073bb82063cf: Waiting
2025-12-04T09:35:38.3332406Z 4f4fb700ef54: Waiting
2025-12-04T09:35:38.3332687Z b0301534b4a5: Pulling fs layer
2025-12-04T09:35:38.3332993Z 1969e15d0c13: Pulling fs layer
2025-12-04T09:35:38.3333299Z 446083e497f3: Waiting
2025-12-04T09:35:38.3333582Z 73180a0f2d5a: Pulling fs layer
2025-12-04T09:35:38.3333890Z d8a170bef0f4: Waiting
2025-12-04T09:35:38.3334164Z ad81b25cb69f: Pulling fs layer
2025-12-04T09:35:38.3334479Z 0ef42867f370: Waiting
2025-12-04T09:35:38.3334754Z e2b6cd6a5bd0: Waiting
2025-12-04T09:35:38.3335083Z 8165374f8dcc: Pulling fs layer
2025-12-04T09:35:38.3335380Z 9681563a88ff: Waiting
2025-12-04T09:35:38.3335651Z 93efc0181a22: Waiting
2025-12-04T09:35:38.3336169Z 7779c0bb9be2: Pulling fs layer
2025-12-04T09:35:38.3336470Z b0301534b4a5: Waiting
2025-12-04T09:35:38.3336747Z fabe466dd5f3: Waiting
2025-12-04T09:35:38.3337020Z 1969e15d0c13: Waiting
2025-12-04T09:35:38.3337274Z 73180a0f2d5a: Waiting
2025-12-04T09:35:38.3337553Z 4d57ff55f6d4: Waiting
2025-12-04T09:35:38.3337819Z 2b5a11b41761: Waiting
2025-12-04T09:35:38.3338137Z 4d0a1c027262: Pulling fs layer
2025-12-04T09:35:38.3338526Z b21856d1bf42: Waiting
2025-12-04T09:35:38.3338956Z 8165374f8dcc: Waiting
2025-12-04T09:35:38.3339409Z 7779c0bb9be2: Waiting
2025-12-04T09:35:38.3339917Z a51e0dab2d59: Pulling fs layer
2025-12-04T09:35:38.3340475Z dc0780902fca: Waiting
2025-12-04T09:35:38.3340896Z 7454c938f174: Waiting
2025-12-04T09:35:38.3341211Z 4d0a1c027262: Waiting
2025-12-04T09:35:38.3341486Z a51e0dab2d59: Waiting
2025-12-04T09:35:38.3341778Z 3eb6d4ff040b: Pulling fs layer
2025-12-04T09:35:38.3342079Z 029495b23122: Waiting
2025-12-04T09:35:38.3342361Z b168858b8537: Pulling fs layer
2025-12-04T09:35:38.3342693Z d77a39278026: Pulling fs layer
2025-12-04T09:35:38.3342995Z 5bfdaeb5578d: Waiting
2025-12-04T09:35:38.3343280Z 3eb6d4ff040b: Waiting
2025-12-04T09:35:38.3343573Z 36fbd357280b: Pulling fs layer
2025-12-04T09:35:38.3343872Z d77a39278026: Waiting
2025-12-04T09:35:38.3344336Z b168858b8537: Waiting
2025-12-04T09:35:38.3344625Z 4e3b10a5dd6a: Pulling fs layer
2025-12-04T09:35:38.3344924Z 1c6177b2970d: Waiting
2025-12-04T09:35:38.3345196Z 36fbd357280b: Waiting
2025-12-04T09:35:38.3345485Z 3092fab73b59: Pulling fs layer
2025-12-04T09:35:38.3345808Z 20020dd28a15: Pulling fs layer
2025-12-04T09:35:38.3346119Z ae5280ce969d: Pulling fs layer
2025-12-04T09:35:38.3346429Z 4e3b10a5dd6a: Waiting
2025-12-04T09:35:38.3346702Z 3092fab73b59: Waiting
2025-12-04T09:35:38.3346957Z 20020dd28a15: Waiting
2025-12-04T09:35:38.3347238Z 026e4484b749: Pulling fs layer
2025-12-04T09:35:38.3347542Z ae5280ce969d: Waiting
2025-12-04T09:35:38.3347817Z 1be9da2ce53d: Pulling fs layer
2025-12-04T09:35:38.3348141Z 6481b7a1d9fb: Pulling fs layer
2025-12-04T09:35:38.3348456Z 026e4484b749: Waiting
2025-12-04T09:35:38.3348719Z 1be9da2ce53d: Waiting
2025-12-04T09:35:38.3349002Z fa519d18c39d: Pulling fs layer
2025-12-04T09:35:38.3349317Z 6481b7a1d9fb: Waiting
2025-12-04T09:35:38.3349595Z d172f25b97f7: Pulling fs layer
2025-12-04T09:35:38.3349919Z fd60ab6b1c2c: Pulling fs layer
2025-12-04T09:35:38.3350226Z fa519d18c39d: Waiting
2025-12-04T09:35:38.3350481Z d172f25b97f7: Waiting
2025-12-04T09:35:38.3350764Z 0afe45579c2c: Pulling fs layer
2025-12-04T09:35:38.3351075Z fd60ab6b1c2c: Waiting
2025-12-04T09:35:38.3351347Z 5884ffd6720b: Pulling fs layer
2025-12-04T09:35:38.3351653Z 0afe45579c2c: Waiting
2025-12-04T09:35:38.3351938Z ab7a7c316fa7: Pulling fs layer
2025-12-04T09:35:38.3352244Z 5884ffd6720b: Waiting
2025-12-04T09:35:38.3352508Z c7775ce5574b: Pulling fs layer
2025-12-04T09:35:38.3352812Z c7775ce5574b: Waiting
2025-12-04T09:35:38.3353090Z 81945c4fb228: Pulling fs layer
2025-12-04T09:35:38.3353387Z ab7a7c316fa7: Waiting
2025-12-04T09:35:38.3353679Z 663cbe24d60b: Pulling fs layer
2025-12-04T09:35:38.3354002Z 43f216b02786: Pulling fs layer
2025-12-04T09:35:38.3354294Z 81945c4fb228: Waiting
2025-12-04T09:35:38.3354568Z 43f216b02786: Waiting
2025-12-04T09:35:38.3354850Z c47c3cfeb687: Pulling fs layer
2025-12-04T09:35:38.3355153Z 663cbe24d60b: Waiting
2025-12-04T09:35:38.3355434Z 7d326b9e2673: Pulling fs layer
2025-12-04T09:35:38.3355745Z c47c3cfeb687: Waiting
2025-12-04T09:35:38.3356015Z 7ec8f17141c8: Pulling fs layer
2025-12-04T09:35:38.3356321Z 7d326b9e2673: Waiting
2025-12-04T09:35:38.3356600Z 26249ea175bf: Pulling fs layer
2025-12-04T09:35:38.3356907Z 5e8e9ccb36f3: Pulling fs layer
2025-12-04T09:35:38.3357218Z 7ec8f17141c8: Waiting
2025-12-04T09:35:38.3357488Z 26249ea175bf: Waiting
2025-12-04T09:35:38.3357763Z 5bc72d4e1de8: Pulling fs layer
2025-12-04T09:35:38.3358087Z 83cddbd49779: Pulling fs layer
2025-12-04T09:35:38.3358414Z 60c25d8c3dd2: Pulling fs layer
2025-12-04T09:35:38.3358738Z a534dcf4b9a9: Pulling fs layer
2025-12-04T09:35:38.3359158Z 5bc72d4e1de8: Waiting
2025-12-04T09:35:38.3359441Z 10138310c65c: Pulling fs layer
2025-12-04T09:35:38.3359810Z 60c25d8c3dd2: Waiting
2025-12-04T09:35:38.3360132Z 83cddbd49779: Waiting
2025-12-04T09:35:38.3360413Z 8487679f252b: Pulling fs layer
2025-12-04T09:35:38.3360726Z a534dcf4b9a9: Waiting
2025-12-04T09:35:38.3360995Z 52580ee2caa9: Pulling fs layer
2025-12-04T09:35:38.3361386Z 741c215cb2ff: Pulling fs layer
2025-12-04T09:35:38.3361923Z d17f5aba17a6: Pulling fs layer
2025-12-04T09:35:38.3362542Z 10138310c65c: Waiting
2025-12-04T09:35:38.3362982Z 52580ee2caa9: Waiting
2025-12-04T09:35:38.3363468Z bc08246bb4ba: Pulling fs layer
2025-12-04T09:35:38.3363971Z d17f5aba17a6: Waiting
2025-12-04T09:35:38.3364438Z 741c215cb2ff: Waiting
2025-12-04T09:35:38.3364916Z 7323bf084bf9: Pulling fs layer
2025-12-04T09:35:38.3365424Z bc08246bb4ba: Waiting
2025-12-04T09:35:38.3365869Z d344ecc97fd7: Pulling fs layer
2025-12-04T09:35:38.3366244Z fb60b2d2147f: Pulling fs layer
2025-12-04T09:35:38.3366555Z 7323bf084bf9: Waiting
2025-12-04T09:35:38.3366883Z d344ecc97fd7: Waiting
2025-12-04T09:35:38.3367149Z 8487679f252b: Waiting
2025-12-04T09:35:38.3367476Z fb60b2d2147f: Waiting
2025-12-04T09:35:38.4026485Z 835841cca3b7: Download complete
2025-12-04T09:35:38.4773573Z b21856d1bf42: Verifying Checksum
2025-12-04T09:35:38.4774017Z b21856d1bf42: Download complete
2025-12-04T09:35:38.5656352Z 848ba2c095e2: Download complete
2025-12-04T09:35:38.6437962Z 029495b23122: Download complete
2025-12-04T09:35:38.6859973Z 63e5bc7682b8: Verifying Checksum
2025-12-04T09:35:38.6860573Z 63e5bc7682b8: Download complete
2025-12-04T09:35:38.7601251Z 59b639308833: Download complete
2025-12-04T09:35:38.7729943Z 073bb82063cf: Verifying Checksum
2025-12-04T09:35:38.8775829Z 073bb82063cf: Download complete
2025-12-04T09:35:38.8776244Z fabe466dd5f3: Download complete
2025-12-04T09:35:38.9504333Z 2b5a11b41761: Download complete
2025-12-04T09:35:39.0219868Z 9681563a88ff: Verifying Checksum
2025-12-04T09:35:39.0220306Z 9681563a88ff: Download complete
2025-12-04T09:35:39.0952039Z dc0780902fca: Download complete
2025-12-04T09:35:39.6988056Z 63e5bc7682b8: Pull complete
2025-12-04T09:35:39.7240595Z 835841cca3b7: Pull complete
2025-12-04T09:35:39.9351872Z 1c6177b2970d: Verifying Checksum
2025-12-04T09:35:39.9352330Z 1c6177b2970d: Download complete
2025-12-04T09:35:40.0217585Z 5bfdaeb5578d: Verifying Checksum
2025-12-04T09:35:40.0218027Z 5bfdaeb5578d: Download complete
2025-12-04T09:35:40.1169938Z 0ef42867f370: Download complete
2025-12-04T09:35:40.1892097Z 446083e497f3: Verifying Checksum
2025-12-04T09:35:40.1892759Z 446083e497f3: Download complete
2025-12-04T09:35:40.3014746Z d8a170bef0f4: Verifying Checksum
2025-12-04T09:35:40.3015198Z d8a170bef0f4: Download complete
2025-12-04T09:35:40.3633398Z e2b6cd6a5bd0: Download complete
2025-12-04T09:35:40.4368059Z 93efc0181a22: Verifying Checksum
2025-12-04T09:35:40.4368512Z 93efc0181a22: Download complete
2025-12-04T09:35:40.5291129Z 7454c938f174: Verifying Checksum
2025-12-04T09:35:40.5291546Z 7454c938f174: Download complete
2025-12-04T09:35:40.6109950Z 4d57ff55f6d4: Download complete
2025-12-04T09:35:41.5176513Z 1bf1bb125dea: Verifying Checksum
2025-12-04T09:35:41.5176955Z 1bf1bb125dea: Download complete
2025-12-04T09:35:41.6069828Z 1969e15d0c13: Verifying Checksum
2025-12-04T09:35:41.6070514Z 1969e15d0c13: Download complete
2025-12-04T09:35:41.9858347Z 73180a0f2d5a: Verifying Checksum
2025-12-04T09:35:41.9859052Z 73180a0f2d5a: Download complete
2025-12-04T09:35:42.0756489Z ad81b25cb69f: Verifying Checksum
2025-12-04T09:35:42.0757180Z ad81b25cb69f: Download complete
2025-12-04T09:35:42.1697434Z 8165374f8dcc: Verifying Checksum
2025-12-04T09:35:42.1698077Z 8165374f8dcc: Download complete
2025-12-04T09:35:49.3074024Z 7779c0bb9be2: Verifying Checksum
2025-12-04T09:35:49.3074458Z 7779c0bb9be2: Download complete
2025-12-04T09:35:49.3855256Z 4d0a1c027262: Verifying Checksum
2025-12-04T09:35:49.3855692Z 4d0a1c027262: Download complete
2025-12-04T09:35:49.4753820Z a51e0dab2d59: Verifying Checksum
2025-12-04T09:35:49.4754376Z a51e0dab2d59: Download complete
2025-12-04T09:35:49.5650574Z 3eb6d4ff040b: Verifying Checksum
2025-12-04T09:35:49.5651006Z 3eb6d4ff040b: Download complete
2025-12-04T09:35:49.6447329Z b168858b8537: Verifying Checksum
2025-12-04T09:35:49.6447759Z b168858b8537: Download complete
2025-12-04T09:35:50.1389068Z d77a39278026: Verifying Checksum
2025-12-04T09:35:50.1389500Z d77a39278026: Download complete
2025-12-04T09:35:50.2654951Z 36fbd357280b: Verifying Checksum
2025-12-04T09:35:50.2655377Z 36fbd357280b: Download complete
2025-12-04T09:35:50.3542715Z 4e3b10a5dd6a: Verifying Checksum
2025-12-04T09:35:50.3543102Z 4e3b10a5dd6a: Download complete
2025-12-04T09:35:50.3777890Z 1bf1bb125dea: Pull complete
2025-12-04T09:35:50.4539598Z 3092fab73b59: Verifying Checksum
2025-12-04T09:35:50.4540033Z 3092fab73b59: Download complete
2025-12-04T09:35:50.5453564Z 20020dd28a15: Verifying Checksum
2025-12-04T09:35:50.5453977Z 20020dd28a15: Download complete
2025-12-04T09:35:50.5875509Z b21856d1bf42: Pull complete
2025-12-04T09:35:50.6047991Z ae5280ce969d: Verifying Checksum
2025-12-04T09:35:50.6048369Z ae5280ce969d: Download complete
2025-12-04T09:35:50.7028839Z 026e4484b749: Verifying Checksum
2025-12-04T09:35:50.7029271Z 026e4484b749: Download complete
2025-12-04T09:35:50.7777399Z 848ba2c095e2: Pull complete
2025-12-04T09:35:50.8004378Z 1be9da2ce53d: Verifying Checksum
2025-12-04T09:35:50.8004758Z 1be9da2ce53d: Download complete
2025-12-04T09:35:50.8748049Z 6481b7a1d9fb: Verifying Checksum
2025-12-04T09:35:50.8748450Z 6481b7a1d9fb: Download complete
2025-12-04T09:35:50.9925424Z 029495b23122: Pull complete
2025-12-04T09:35:51.1589274Z 073bb82063cf: Pull complete
2025-12-04T09:35:51.2672407Z 59b639308833: Pull complete
2025-12-04T09:35:53.9234675Z 1c6177b2970d: Pull complete
2025-12-04T09:35:54.1438146Z fabe466dd5f3: Pull complete
2025-12-04T09:35:54.3707497Z 2b5a11b41761: Pull complete
2025-12-04T09:35:54.5943671Z 9681563a88ff: Pull complete
2025-12-04T09:35:54.8093076Z dc0780902fca: Pull complete
2025-12-04T09:35:55.7681823Z fa519d18c39d: Verifying Checksum
2025-12-04T09:35:55.7682538Z fa519d18c39d: Download complete
2025-12-04T09:36:25.4928213Z 5b09a2b135c8: Download complete
2025-12-04T09:36:25.5833939Z fd60ab6b1c2c: Download complete
2025-12-04T09:36:25.6823415Z 0afe45579c2c: Download complete
2025-12-04T09:36:25.7457950Z 5884ffd6720b: Verifying Checksum
2025-12-04T09:36:25.7460813Z 5884ffd6720b: Download complete
2025-12-04T09:36:25.8568706Z ab7a7c316fa7: Download complete
2025-12-04T09:36:25.9644531Z c7775ce5574b: Verifying Checksum
2025-12-04T09:36:25.9645063Z c7775ce5574b: Download complete
2025-12-04T09:36:26.0692352Z 81945c4fb228: Verifying Checksum
2025-12-04T09:36:26.0692933Z 81945c4fb228: Download complete
2025-12-04T09:36:26.1756113Z 663cbe24d60b: Verifying Checksum
2025-12-04T09:36:26.1756563Z 663cbe24d60b: Download complete
2025-12-04T09:36:26.2658065Z 43f216b02786: Verifying Checksum
2025-12-04T09:36:26.2658744Z 43f216b02786: Download complete
2025-12-04T09:36:26.3573525Z c47c3cfeb687: Verifying Checksum
2025-12-04T09:36:26.3573943Z c47c3cfeb687: Download complete
2025-12-04T09:36:26.4407766Z 7d326b9e2673: Download complete
2025-12-04T09:36:26.5331117Z 7ec8f17141c8: Verifying Checksum
2025-12-04T09:36:26.5331697Z 7ec8f17141c8: Download complete
2025-12-04T09:36:26.6032682Z 26249ea175bf: Verifying Checksum
2025-12-04T09:36:26.6033362Z 26249ea175bf: Download complete
2025-12-04T09:36:26.6854862Z 5e8e9ccb36f3: Verifying Checksum
2025-12-04T09:36:26.6855574Z 5e8e9ccb36f3: Download complete
2025-12-04T09:36:26.7808976Z 5bc72d4e1de8: Verifying Checksum
2025-12-04T09:36:26.7809693Z 5bc72d4e1de8: Download complete
2025-12-04T09:36:26.8807716Z 83cddbd49779: Verifying Checksum
2025-12-04T09:36:26.8808139Z 83cddbd49779: Download complete
2025-12-04T09:36:26.9807695Z 60c25d8c3dd2: Download complete
2025-12-04T09:36:27.0573453Z a534dcf4b9a9: Verifying Checksum
2025-12-04T09:36:27.0573859Z a534dcf4b9a9: Download complete
2025-12-04T09:36:30.6460855Z 10138310c65c: Verifying Checksum
2025-12-04T09:36:30.6461283Z 10138310c65c: Download complete
2025-12-04T09:36:30.7251536Z 8487679f252b: Verifying Checksum
2025-12-04T09:36:30.7251948Z 8487679f252b: Download complete
2025-12-04T09:36:30.8172006Z 52580ee2caa9: Download complete
2025-12-04T09:36:30.9446978Z 741c215cb2ff: Download complete
2025-12-04T09:36:31.0497519Z d17f5aba17a6: Verifying Checksum
2025-12-04T09:36:31.0498180Z d17f5aba17a6: Download complete
2025-12-04T09:36:31.1168580Z bc08246bb4ba: Download complete
2025-12-04T09:36:31.5538843Z 7323bf084bf9: Verifying Checksum
2025-12-04T09:36:31.5539456Z 7323bf084bf9: Download complete
2025-12-04T09:36:31.6505785Z d344ecc97fd7: Verifying Checksum
2025-12-04T09:36:31.6506220Z d344ecc97fd7: Download complete
2025-12-04T09:36:32.6191596Z fb60b2d2147f: Download complete
2025-12-04T09:36:46.7157719Z d172f25b97f7: Verifying Checksum
2025-12-04T09:36:46.7158156Z d172f25b97f7: Download complete
2025-12-04T09:37:17.5410826Z 5b09a2b135c8: Pull complete
2025-12-04T09:37:17.7544471Z 4f4fb700ef54: Pull complete
2025-12-04T09:37:17.9689668Z 5bfdaeb5578d: Pull complete
2025-12-04T09:37:18.2261689Z 0ef42867f370: Pull complete
2025-12-04T09:37:18.4542298Z 446083e497f3: Pull complete
2025-12-04T09:37:18.7282366Z d8a170bef0f4: Pull complete
2025-12-04T09:37:18.9445716Z e2b6cd6a5bd0: Pull complete
2025-12-04T09:37:19.1673229Z 93efc0181a22: Pull complete
2025-12-04T09:37:19.3875083Z 7454c938f174: Pull complete
2025-12-04T09:37:19.6060104Z 4d57ff55f6d4: Pull complete
2025-12-04T09:37:20.5723932Z b0301534b4a5: Verifying Checksum
2025-12-04T09:37:20.5724372Z b0301534b4a5: Download complete
2025-12-04T09:38:36.8002535Z b0301534b4a5: Pull complete
2025-12-04T09:38:37.0149409Z 1969e15d0c13: Pull complete
2025-12-04T09:38:37.8159429Z 73180a0f2d5a: Pull complete
2025-12-04T09:38:38.0375045Z ad81b25cb69f: Pull complete
2025-12-04T09:38:38.2693904Z 8165374f8dcc: Pull complete
2025-12-04T09:38:46.3434348Z 7779c0bb9be2: Pull complete
2025-12-04T09:38:46.5642926Z 4d0a1c027262: Pull complete
2025-12-04T09:38:46.7895181Z a51e0dab2d59: Pull complete
2025-12-04T09:38:47.1166699Z 3eb6d4ff040b: Pull complete
2025-12-04T09:38:47.2914514Z b168858b8537: Pull complete
2025-12-04T09:38:47.7406934Z d77a39278026: Pull complete
2025-12-04T09:38:47.9656984Z 36fbd357280b: Pull complete
2025-12-04T09:38:48.1737088Z 4e3b10a5dd6a: Pull complete
2025-12-04T09:38:48.5776841Z 3092fab73b59: Pull complete
2025-12-04T09:38:48.7914997Z 20020dd28a15: Pull complete
2025-12-04T09:38:49.0219358Z ae5280ce969d: Pull complete
2025-12-04T09:38:49.4135588Z 026e4484b749: Pull complete
2025-12-04T09:38:49.6383296Z 1be9da2ce53d: Pull complete
2025-12-04T09:38:50.0297281Z 6481b7a1d9fb: Pull complete
2025-12-04T09:38:51.8398427Z fa519d18c39d: Pull complete
2025-12-04T09:39:51.5372647Z d172f25b97f7: Pull complete
2025-12-04T09:39:51.6565338Z fd60ab6b1c2c: Pull complete
2025-12-04T09:39:51.7640957Z 0afe45579c2c: Pull complete
2025-12-04T09:39:51.9945632Z 5884ffd6720b: Pull complete
2025-12-04T09:39:52.2521721Z ab7a7c316fa7: Pull complete
2025-12-04T09:39:52.4610155Z c7775ce5574b: Pull complete
2025-12-04T09:39:52.6448674Z 81945c4fb228: Pull complete
2025-12-04T09:39:52.7132800Z 663cbe24d60b: Pull complete
2025-12-04T09:39:52.7481517Z 43f216b02786: Pull complete
2025-12-04T09:39:52.8188782Z c47c3cfeb687: Pull complete
2025-12-04T09:39:52.8825677Z 7d326b9e2673: Pull complete
2025-12-04T09:39:52.9131833Z 7ec8f17141c8: Pull complete
2025-12-04T09:39:52.9780199Z 26249ea175bf: Pull complete
2025-12-04T09:39:53.0145600Z 5e8e9ccb36f3: Pull complete
2025-12-04T09:39:53.0838749Z 5bc72d4e1de8: Pull complete
2025-12-04T09:39:53.1184437Z 83cddbd49779: Pull complete
2025-12-04T09:39:53.1827405Z 60c25d8c3dd2: Pull complete
2025-12-04T09:39:53.2064660Z a534dcf4b9a9: Pull complete
2025-12-04T09:39:59.9416997Z 10138310c65c: Pull complete
2025-12-04T09:40:00.1087683Z 8487679f252b: Pull complete
2025-12-04T09:40:00.3427433Z 52580ee2caa9: Pull complete
2025-12-04T09:40:00.4696300Z 741c215cb2ff: Pull complete
2025-12-04T09:40:00.5940866Z d17f5aba17a6: Pull complete
2025-12-04T09:40:00.7370839Z bc08246bb4ba: Pull complete
2025-12-04T09:40:02.2096628Z 7323bf084bf9: Pull complete
2025-12-04T09:40:02.4209394Z d344ecc97fd7: Pull complete
2025-12-04T09:40:04.3803409Z fb60b2d2147f: Pull complete
2025-12-04T09:40:04.5721823Z Digest: sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940
2025-12-04T09:40:04.5835755Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:40:04.5867092Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:40:04.5925927Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:40:04.5927098Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:04.5935194Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:04.5935659Z env:
2025-12-04T09:40:04.5935897Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:04.5936201Z ##[endgroup]
2025-12-04T09:40:04.6144505Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2025-12-04T09:40:04.6145010Z with:
2025-12-04T09:40:04.6145266Z   driver-version: 525.105.17
2025-12-04T09:40:04.6145567Z env:
2025-12-04T09:40:04.6145795Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:04.6146096Z ##[endgroup]
2025-12-04T09:40:04.6169156Z ##[group]Run echo "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"
2025-12-04T09:40:04.6170259Z [36;1mecho "IN_CONTAINER_RUNNER=$(if [ -f /.inarc ] || [ -f /.incontainer ]; then echo true ; else echo false; fi)" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:04.6177931Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:04.6178377Z env:
2025-12-04T09:40:04.6178630Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:04.6178938Z ##[endgroup]
2025-12-04T09:40:04.6238232Z ##[group]Run set -euo pipefail
2025-12-04T09:40:04.6238616Z [36;1mset -euo pipefail[0m
2025-12-04T09:40:04.6238970Z [36;1m[0m
2025-12-04T09:40:04.6239204Z [36;1mhas_gpu=false[0m
2025-12-04T09:40:04.6239497Z [36;1mdevices=""[0m
2025-12-04T09:40:04.6239764Z [36;1m[0m
2025-12-04T09:40:04.6240070Z [36;1mif command -v nvidia-smi >/dev/null 2>&1; then[0m
2025-12-04T09:40:04.6240602Z [36;1m  if nvidia-smi -L >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:04.6241332Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:04.6241684Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:04.6242054Z [36;1m  fi[0m
2025-12-04T09:40:04.6242381Z [36;1mfi[0m
2025-12-04T09:40:04.6242623Z [36;1m[0m
2025-12-04T09:40:04.6242875Z [36;1mif [ "$has_gpu" = false ]; then[0m
2025-12-04T09:40:04.6243345Z [36;1m  if ls /dev/nvidia* >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:04.6243807Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:04.6244167Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:04.6244527Z [36;1m  fi[0m
2025-12-04T09:40:04.6244771Z [36;1mfi[0m
2025-12-04T09:40:04.6245053Z [36;1m[0m
2025-12-04T09:40:04.6245405Z [36;1mif [ "$has_gpu" = false ] && command -v lspci >/dev/null 2>&1; then[0m
2025-12-04T09:40:04.6246015Z [36;1m  if lspci | grep -i 'nvidia' >/tmp/nvidia_devices 2>/dev/null; then[0m
2025-12-04T09:40:04.6246508Z [36;1m    has_gpu=true[0m
2025-12-04T09:40:04.6246857Z [36;1m    devices=$(cat /tmp/nvidia_devices)[0m
2025-12-04T09:40:04.6247215Z [36;1m  fi[0m
2025-12-04T09:40:04.6247462Z [36;1mfi[0m
2025-12-04T09:40:04.6247702Z [36;1m[0m
2025-12-04T09:40:04.6248047Z [36;1mprintf 'HAS_NVIDIA=%s\n' "$has_gpu" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:04.6248689Z [36;1mprintf 'DETECTED_DEVICES<<EOF\n%s\nEOF\n' "$devices" >> "$GITHUB_OUTPUT"[0m
2025-12-04T09:40:04.6255228Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:04.6255662Z env:
2025-12-04T09:40:04.6256065Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:04.6256370Z ##[endgroup]
2025-12-04T09:40:06.1634160Z ##[group]Run if [ "${HAS_NVIDIA}" = "true" ]; then
2025-12-04T09:40:06.1634647Z [36;1mif [ "${HAS_NVIDIA}" = "true" ]; then[0m
2025-12-04T09:40:06.1635094Z [36;1m  echo "HAS_NVIDIA_GPU=true" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:06.1635701Z [36;1m  echo "GPU_FLAG=--gpus all -e NVIDIA_DRIVER_CAPABILITIES=all" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:06.1636255Z [36;1melse[0m
2025-12-04T09:40:06.1636585Z [36;1m  echo "HAS_NVIDIA_GPU=false" >> "${GITHUB_ENV}"[0m
2025-12-04T09:40:06.1636990Z [36;1mfi[0m
2025-12-04T09:40:06.1644327Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:40:06.1644775Z env:
2025-12-04T09:40:06.1645025Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:06.1645311Z   HAS_NVIDIA: true
2025-12-04T09:40:06.1645575Z ##[endgroup]
2025-12-04T09:40:06.1724654Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2025-12-04T09:40:06.1725178Z with:
2025-12-04T09:40:06.1725416Z   timeout_minutes: 10
2025-12-04T09:40:06.1725701Z   max_attempts: 3
2025-12-04T09:40:06.1758868Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        if [[ "${DISTRIBUTION}" == "amzn2023" ]] ; then
          YUM_REPO_URL="https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo"
        else
          # Amazon Linux 2
          YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"
        fi

        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y \
          nvidia-container-toolkit-1.17.8 \
          libnvidia-container-tools-1.17.8 \
          libnvidia-container1-1.17.8 \
          nvidia-container-toolkit-base-1.17.8
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        # Install nvidia-driver package if not installed
        status="$(dpkg-query -W --showformat='${db:Status-Status}' nvidia-docker2 2>&1)"
        if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
          sudo apt-get install -y nvidia-container-toolkit-1.17.8
          sudo systemctl restart docker
        fi
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"

                # Turn off persistent mode so that the installation script can unload the kernel module
                sudo killall nvidia-persistenced || true
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e

            nvidia-smi
            # NB: Annoyingly, nvidia-smi command returns successfully with return code 0 even in
            # the case where the driver has already crashed as it still can get the driver version
            # and some basic information like the bus ID.  However, the rest of the information
            # would be missing (ERR!), for example:
            #
            # +-----------------------------------------------------------------------------+
            # | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |
            # |-------------------------------+----------------------+----------------------+
            # | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
            # | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
            # |                               |                      |               MIG M. |
            # |===============================+======================+======================|
            # |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |
            # |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |
            # |                               |                      |                 ERR! |
            # +-------------------------------+----------------------+----------------------+
            #
            # +-----------------------------------------------------------------------------+
            # | Processes:                                                                  |
            # |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
            # |        ID   ID                                                   Usage      |
            # |=============================================================================|
            # +-----------------------------------------------------------------------------+
            #
            # This should be reported as a failure instead as it will guarantee to fail when
            # Docker tries to run with --gpus all
            #
            # So, the correct check here is to query one of the missing piece of info like
            # GPU name, so that the command can fail accordingly
            nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Fix https://github.com/NVIDIA/nvidia-docker/issues/1648 on runners with
# more than one GPUs. This just needs to be run once. The command fails
# on subsequent runs and complains that the mode is already on, but that's
# ok
sudo nvidia-persistenced || true
# This should show persistence mode ON
nvidia-smi

# check if the container-toolkit is correctly installed and CUDA is available inside a container
docker run --rm -t --gpus=all public.ecr.aws/docker/library/python:3.13 nvidia-smi

2025-12-04T09:40:06.1792717Z   retry_wait_seconds: 10
2025-12-04T09:40:06.1793047Z   polling_interval_seconds: 1
2025-12-04T09:40:06.1793372Z   warning_on_retry: true
2025-12-04T09:40:06.1793693Z   continue_on_error: false
2025-12-04T09:40:06.1793995Z env:
2025-12-04T09:40:06.1794220Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:40:06.1794523Z   HAS_NVIDIA_GPU: true
2025-12-04T09:40:06.1794886Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:40:06.1795304Z   DRIVER_VERSION: 525.105.17
2025-12-04T09:40:06.1795605Z ##[endgroup]
2025-12-04T09:40:06.2980502Z == Installing nvidia driver NVIDIA-Linux-x86_64-525.105.17.run ==
2025-12-04T09:40:06.2981675Z + pre_install_nvidia_driver_amzn2
2025-12-04T09:40:06.2982084Z + sudo yum remove -y nvidia-driver-latest-dkms
2025-12-04T09:40:06.8903296Z No match for argument: nvidia-driver-latest-dkms
2025-12-04T09:40:06.8903795Z No packages marked for removal.
2025-12-04T09:40:06.8976909Z Dependencies resolved.
2025-12-04T09:40:06.8987758Z Nothing to do.
2025-12-04T09:40:06.8989434Z Complete!
2025-12-04T09:40:06.9357966Z + install_nvidia_driver_common
2025-12-04T09:40:06.9361505Z + echo 'Before installing NVIDIA driver'
2025-12-04T09:40:06.9361899Z Before installing NVIDIA driver
2025-12-04T09:40:06.9363306Z + lspci
2025-12-04T09:40:06.9552354Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:40:06.9552990Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:40:06.9553681Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:40:06.9554358Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:40:06.9554973Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:40:06.9555653Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:40:06.9556274Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:40:06.9556908Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:40:06.9557426Z + lsmod
2025-12-04T09:40:06.9597072Z Module                  Size  Used by
2025-12-04T09:40:06.9597671Z nvidia_uvm           1925120  0
2025-12-04T09:40:06.9598033Z nvidia              14286848  1 nvidia_uvm
2025-12-04T09:40:06.9598397Z drm                   602112  1 nvidia
2025-12-04T09:40:06.9598765Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:40:06.9599157Z backlight              24576  1 drm
2025-12-04T09:40:06.9599519Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:40:06.9599867Z xt_conntrack           16384  1
2025-12-04T09:40:06.9600190Z nft_chain_nat          16384  3
2025-12-04T09:40:06.9600510Z xt_MASQUERADE          20480  1
2025-12-04T09:40:06.9601019Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:40:06.9601442Z nf_conntrack_netlink    57344  0
2025-12-04T09:40:06.9601941Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:40:06.9602572Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:40:06.9602950Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:40:06.9603324Z xfrm_user              57344  1
2025-12-04T09:40:06.9603657Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:40:06.9604002Z xt_addrtype            16384  2
2025-12-04T09:40:06.9604324Z nft_compat             20480  4
2025-12-04T09:40:06.9604703Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:40:06.9605213Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:40:06.9605682Z br_netfilter           36864  0
2025-12-04T09:40:06.9606021Z bridge                323584  1 br_netfilter
2025-12-04T09:40:06.9606395Z stp                    16384  1 bridge
2025-12-04T09:40:06.9606730Z llc                    16384  2 bridge,stp
2025-12-04T09:40:06.9607079Z overlay               167936  0
2025-12-04T09:40:06.9607400Z tls                   139264  0
2025-12-04T09:40:06.9607700Z nls_ascii              16384  1
2025-12-04T09:40:06.9608019Z nls_cp437              20480  1
2025-12-04T09:40:06.9608328Z vfat                   24576  1
2025-12-04T09:40:06.9608629Z fat                    86016  1 vfat
2025-12-04T09:40:06.9608965Z sunrpc                700416  1
2025-12-04T09:40:06.9609269Z i8042                  45056  0
2025-12-04T09:40:06.9609561Z ena                   184320  0
2025-12-04T09:40:06.9609874Z skx_edac_common        28672  0
2025-12-04T09:40:06.9610195Z serio                  28672  3 i8042
2025-12-04T09:40:06.9610540Z ghash_clmulni_intel    16384  0
2025-12-04T09:40:06.9610845Z button                 24576  0
2025-12-04T09:40:06.9611156Z sch_fq_codel           20480  17
2025-12-04T09:40:06.9611474Z dm_mod                188416  0
2025-12-04T09:40:06.9611769Z fuse                  184320  1
2025-12-04T09:40:06.9612077Z configfs               57344  1
2025-12-04T09:40:06.9612407Z loop                   36864  0
2025-12-04T09:40:06.9612707Z dmi_sysfs              20480  0
2025-12-04T09:40:06.9613206Z crc32_pclmul           16384  0
2025-12-04T09:40:06.9613521Z crc32c_intel           24576  0
2025-12-04T09:40:06.9613826Z efivarfs               24576  1
2025-12-04T09:40:06.9614142Z + modinfo nvidia
2025-12-04T09:40:06.9616447Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:40:06.9617019Z import_ns:      DMA_BUF
2025-12-04T09:40:06.9617309Z alias:          char-major-195-*
2025-12-04T09:40:06.9617641Z version:        580.82.07
2025-12-04T09:40:06.9617950Z supported:      external
2025-12-04T09:40:06.9618243Z license:        Dual MIT/GPL
2025-12-04T09:40:06.9618605Z firmware:       nvidia/580.82.07/gsp_tu10x.bin
2025-12-04T09:40:06.9619028Z firmware:       nvidia/580.82.07/gsp_ga10x.bin
2025-12-04T09:40:06.9619415Z srcversion:     BA7240A71DCF7DC6FE88C1D
2025-12-04T09:40:06.9619828Z alias:          of:N*T*Cnvidia,tegra264-displayC*
2025-12-04T09:40:06.9620266Z alias:          of:N*T*Cnvidia,tegra264-display
2025-12-04T09:40:06.9620690Z alias:          of:N*T*Cnvidia,tegra234-displayC*
2025-12-04T09:40:06.9621126Z alias:          of:N*T*Cnvidia,tegra234-display
2025-12-04T09:40:06.9621724Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:40:06.9622154Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:40:06.9622555Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:40:06.9622944Z depends:        i2c-core,drm
2025-12-04T09:40:06.9623260Z retpoline:      Y
2025-12-04T09:40:06.9623513Z name:           nvidia
2025-12-04T09:40:06.9623965Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:40:06.9624561Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:40:06.9625118Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:40:06.9625634Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:40:06.9626020Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:40:06.9626389Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:40:06.9626772Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:40:06.9627149Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:40:06.9627531Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:40:06.9627965Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:40:06.9628448Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:40:06.9628864Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:40:06.9629245Z parm:           NVreg_EnableMSI:int
2025-12-04T09:40:06.9629611Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:40:06.9630057Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:40:06.9630544Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:40:06.9631003Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:40:06.9631505Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:40:06.9632013Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:40:06.9632520Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:40:06.9633026Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:40:06.9633447Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:40:06.9633907Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:40:06.9634357Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:40:06.9634779Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:40:06.9635184Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:40:06.9635580Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:40:06.9635981Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:40:06.9636370Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:40:06.9636785Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:40:06.9637231Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:40:06.9637673Z parm:           NVreg_RegisterPlatformDeviceDriver:int
2025-12-04T09:40:06.9638215Z parm:           NVreg_EnableResizableBar:int
2025-12-04T09:40:06.9638618Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:40:06.9639061Z parm:           NVreg_EnableNonblockingOpen:int
2025-12-04T09:40:06.9639502Z parm:           NVreg_CoherentGPUMemoryMode:charp
2025-12-04T09:40:06.9639917Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:40:06.9640342Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:40:06.9640754Z parm:           NVreg_RmMsg:charp
2025-12-04T09:40:06.9641099Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:40:06.9641503Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:40:06.9641908Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:40:06.9642383Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:40:06.9642795Z parm:           NVreg_RmNvlinkBandwidth:charp
2025-12-04T09:40:06.9643236Z parm:           NVreg_RmNvlinkBandwidthLinkCount:int
2025-12-04T09:40:06.9643671Z parm:           NVreg_ImexChannelCount:int
2025-12-04T09:40:06.9644068Z parm:           NVreg_CreateImexChannel0:int
2025-12-04T09:40:06.9644494Z parm:           NVreg_GrdmaPciTopoCheckOverride:int
2025-12-04T09:40:06.9644987Z parm:           rm_firmware_active:charp
2025-12-04T09:40:06.9645337Z + HAS_NVIDIA_DRIVER=0
2025-12-04T09:40:06.9645638Z ++ command -v nvidia-smi
2025-12-04T09:40:06.9645955Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:40:06.9646260Z + set +e
2025-12-04T09:40:06.9646641Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:40:08.4854083Z + INSTALLED_DRIVER_VERSION=580.82.07
2025-12-04T09:40:08.4854512Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:40:08.4854819Z + '[' 0 -ne 0 ']'
2025-12-04T09:40:08.4855085Z + '[' 580.82.07 '!=' 525.105.17 ']'
2025-12-04T09:40:08.4855690Z + echo 'NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing'
2025-12-04T09:40:08.4856355Z + sudo killall nvidia-persistenced
2025-12-04T09:40:08.4856947Z NVIDIA driver (580.82.07) has been installed, but we expect to have 525.105.17 instead. Continuing
2025-12-04T09:40:08.6337980Z nvidia-persistenced: no process found
2025-12-04T09:40:08.6356102Z + true
2025-12-04T09:40:08.6356386Z + set -e
2025-12-04T09:40:08.6356811Z + '[' 0 -eq 0 ']'
2025-12-04T09:40:08.6357078Z + '[' amzn2023 '!=' ubuntu20.04 ']'
2025-12-04T09:40:08.6357478Z + sudo yum groupinstall -y 'Development Tools'
2025-12-04T09:40:09.1498352Z Last metadata expiration check: 0:22:20 ago on Thu Dec  4 09:17:49 2025.
2025-12-04T09:40:09.1943778Z No match for group package "system-rpm-config"
2025-12-04T09:40:09.1964050Z No match for group package "rcs"
2025-12-04T09:40:09.1990305Z No match for group package "pkgconfig"
2025-12-04T09:40:09.2577623Z Dependencies resolved.
2025-12-04T09:40:09.2916783Z ================================================================================
2025-12-04T09:40:09.2917365Z  Package           Architecture     Version             Repository         Size
2025-12-04T09:40:09.2917900Z ================================================================================
2025-12-04T09:40:09.2918319Z Installing Groups:
2025-12-04T09:40:09.2918707Z  Development Tools                                                             
2025-12-04T09:40:09.2919071Z 
2025-12-04T09:40:09.2919178Z Transaction Summary
2025-12-04T09:40:09.2919483Z ================================================================================
2025-12-04T09:40:09.2919758Z 
2025-12-04T09:40:09.5071444Z ================================================================================
2025-12-04T09:40:09.5071918Z WARNING:
2025-12-04T09:40:09.5072225Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:40:09.5072516Z 
2025-12-04T09:40:09.5072625Z   Available Versions:
2025-12-04T09:40:09.5072819Z 
2025-12-04T09:40:09.5072934Z   Version 2023.9.20250929:
2025-12-04T09:40:09.5073328Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:40:09.5073652Z 
2025-12-04T09:40:09.5073818Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:40:09.5074325Z 
2025-12-04T09:40:09.5074427Z     Release notes:
2025-12-04T09:40:09.5074963Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:40:09.5075441Z 
2025-12-04T09:40:09.5075563Z   Version 2023.9.20251014:
2025-12-04T09:40:09.5075954Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:40:09.5076276Z 
2025-12-04T09:40:09.5076414Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:40:09.5076698Z 
2025-12-04T09:40:09.5076795Z     Release notes:
2025-12-04T09:40:09.5077292Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:40:09.5077762Z 
2025-12-04T09:40:09.5077867Z   Version 2023.9.20251020:
2025-12-04T09:40:09.5078249Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:40:09.5078578Z 
2025-12-04T09:40:09.5078714Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:40:09.5078970Z 
2025-12-04T09:40:09.5079084Z     Release notes:
2025-12-04T09:40:09.5079565Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:40:09.5080043Z 
2025-12-04T09:40:09.5080278Z   Version 2023.9.20251027:
2025-12-04T09:40:09.5080667Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:40:09.5080983Z 
2025-12-04T09:40:09.5081132Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:40:09.5081390Z 
2025-12-04T09:40:09.5081488Z     Release notes:
2025-12-04T09:40:09.5081985Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:40:09.5082527Z 
2025-12-04T09:40:09.5082645Z   Version 2023.9.20251105:
2025-12-04T09:40:09.5083012Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:40:09.5083345Z 
2025-12-04T09:40:09.5083482Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:40:09.5083759Z 
2025-12-04T09:40:09.5083859Z     Release notes:
2025-12-04T09:40:09.5084351Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:40:09.5084824Z 
2025-12-04T09:40:09.5084929Z   Version 2023.9.20251110:
2025-12-04T09:40:09.5085319Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:40:09.5085638Z 
2025-12-04T09:40:09.5085787Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:40:09.5086048Z 
2025-12-04T09:40:09.5086149Z     Release notes:
2025-12-04T09:40:09.5086641Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:40:09.5087119Z 
2025-12-04T09:40:09.5087223Z   Version 2023.9.20251117:
2025-12-04T09:40:09.5087602Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:40:09.5087916Z 
2025-12-04T09:40:09.5088050Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:40:09.5088321Z 
2025-12-04T09:40:09.5088420Z     Release notes:
2025-12-04T09:40:09.5088911Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:40:09.5089384Z 
2025-12-04T09:40:09.5089527Z ================================================================================
2025-12-04T09:40:09.5089903Z Complete!
2025-12-04T09:40:09.5539006Z ++ uname -r
2025-12-04T09:40:09.5549528Z + sudo yum install -y 'kernel-devel-uname-r == 6.1.150-174.273.amzn2023.x86_64'
2025-12-04T09:40:10.0989915Z Last metadata expiration check: 0:22:21 ago on Thu Dec  4 09:17:49 2025.
2025-12-04T09:40:10.1294192Z Using '==' operator in reldeps can result in an undefined behavior. It is deprecated and the support will be dropped in future versions. Use '=' operator instead.
2025-12-04T09:40:10.1418781Z Package kernel-devel-1:6.1.150-174.273.amzn2023.x86_64 is already installed.
2025-12-04T09:40:10.2036199Z Dependencies resolved.
2025-12-04T09:40:10.2371316Z Nothing to do.
2025-12-04T09:40:10.2372018Z Complete!
2025-12-04T09:40:10.2795064Z + sudo modprobe backlight
2025-12-04T09:40:10.4189862Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-525.105.17.run
2025-12-04T09:40:14.7857630Z + set +e
2025-12-04T09:40:14.7858131Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm
2025-12-04T09:40:16.2585947Z Verifying archive integrity... OK
2025-12-04T09:40:43.6226299Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.105.17...................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
2025-12-04T09:40:44.1638808Z 
2025-12-04T09:40:44.1639724Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.
2025-12-04T09:40:44.1640424Z 
2025-12-04T09:41:10.0974212Z 
2025-12-04T09:41:10.0976287Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system.  If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver.
2025-12-04T09:41:10.0977954Z 
2025-12-04T09:41:10.0994053Z 
2025-12-04T09:41:10.0995488Z WARNING: This NVIDIA driver package includes Vulkan components, but no Vulkan ICD loader was detected on this system. The NVIDIA Vulkan ICD will not function without the loader. Most distributions package the Vulkan loader; try installing the "vulkan-loader", "vulkan-icd-loader", or "libvulkan1" package.
2025-12-04T09:41:10.0996965Z 
2025-12-04T09:41:21.5384228Z + NVIDIA_INSTALLATION_STATUS=0
2025-12-04T09:41:21.5384657Z + RESET_GPU=0
2025-12-04T09:41:21.5384937Z + '[' 0 -ne 0 ']'
2025-12-04T09:41:21.5386324Z ++ command -v nvidia-smi
2025-12-04T09:41:21.5389482Z + '[' -x /usr/bin/nvidia-smi ']'
2025-12-04T09:41:21.5393319Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2025-12-04T09:41:24.1232702Z + INSTALLED_DRIVER_VERSION=525.105.17
2025-12-04T09:41:24.1233131Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:41:24.1233413Z + '[' 0 -ne 0 ']'
2025-12-04T09:41:24.1233675Z + '[' 0 -eq 1 ']'
2025-12-04T09:41:24.1234089Z + sudo rm -fv /tmp/nvidia_driver
2025-12-04T09:41:24.2705427Z removed '/tmp/nvidia_driver'
2025-12-04T09:41:24.2724880Z + set -e
2025-12-04T09:41:24.2727304Z + post_install_nvidia_driver_common
2025-12-04T09:41:24.2730946Z + sudo modprobe nvidia
2025-12-04T09:41:24.4685087Z + echo 'After installing NVIDIA driver'
2025-12-04T09:41:24.4685777Z + lspci
2025-12-04T09:41:24.4686195Z After installing NVIDIA driver
2025-12-04T09:41:24.4820713Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2025-12-04T09:41:24.4821362Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2025-12-04T09:41:24.4822072Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2025-12-04T09:41:24.4822730Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2025-12-04T09:41:24.4823341Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe EBS Controller
2025-12-04T09:41:24.4824020Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2025-12-04T09:41:24.4824648Z 00:1e.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
2025-12-04T09:41:24.4825652Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2025-12-04T09:41:24.4826183Z + lsmod
2025-12-04T09:41:24.4850576Z Module                  Size  Used by
2025-12-04T09:41:24.4850929Z nvidia              56537088  0
2025-12-04T09:41:24.4851249Z drm                   602112  1 nvidia
2025-12-04T09:41:24.4851634Z drm_panel_orientation_quirks    32768  1 drm
2025-12-04T09:41:24.4852003Z backlight              24576  1 drm
2025-12-04T09:41:24.4852362Z i2c_core              110592  2 nvidia,drm
2025-12-04T09:41:24.4852724Z xt_conntrack           16384  1
2025-12-04T09:41:24.4853035Z nft_chain_nat          16384  3
2025-12-04T09:41:24.4853351Z xt_MASQUERADE          20480  1
2025-12-04T09:41:24.4853718Z nf_nat                 57344  2 nft_chain_nat,xt_MASQUERADE
2025-12-04T09:41:24.4854127Z nf_conntrack_netlink    57344  0
2025-12-04T09:41:24.4854630Z nf_conntrack          184320  4 xt_conntrack,nf_nat,nf_conntrack_netlink,xt_MASQUERADE
2025-12-04T09:41:24.4855191Z nf_defrag_ipv6         24576  1 nf_conntrack
2025-12-04T09:41:24.4855743Z nf_defrag_ipv4         16384  1 nf_conntrack
2025-12-04T09:41:24.4856106Z xfrm_user              57344  1
2025-12-04T09:41:24.4856440Z xfrm_algo              16384  1 xfrm_user
2025-12-04T09:41:24.4856806Z xt_addrtype            16384  2
2025-12-04T09:41:24.4857129Z nft_compat             20480  4
2025-12-04T09:41:24.4857498Z nf_tables             311296  57 nft_compat,nft_chain_nat
2025-12-04T09:41:24.4858027Z nfnetlink              20480  4 nft_compat,nf_conntrack_netlink,nf_tables
2025-12-04T09:41:24.4858500Z br_netfilter           36864  0
2025-12-04T09:41:24.4858828Z bridge                323584  1 br_netfilter
2025-12-04T09:41:24.4859197Z stp                    16384  1 bridge
2025-12-04T09:41:24.4859550Z llc                    16384  2 bridge,stp
2025-12-04T09:41:24.4859888Z overlay               167936  0
2025-12-04T09:41:24.4860196Z tls                   139264  0
2025-12-04T09:41:24.4860509Z nls_ascii              16384  1
2025-12-04T09:41:24.4860804Z nls_cp437              20480  1
2025-12-04T09:41:24.4861118Z vfat                   24576  1
2025-12-04T09:41:24.4861425Z fat                    86016  1 vfat
2025-12-04T09:41:24.4861754Z sunrpc                700416  1
2025-12-04T09:41:24.4862045Z i8042                  45056  0
2025-12-04T09:41:24.4862347Z ena                   184320  0
2025-12-04T09:41:24.4862661Z skx_edac_common        28672  0
2025-12-04T09:41:24.4862967Z serio                  28672  3 i8042
2025-12-04T09:41:24.4863312Z ghash_clmulni_intel    16384  0
2025-12-04T09:41:24.4863630Z button                 24576  0
2025-12-04T09:41:24.4863928Z sch_fq_codel           20480  17
2025-12-04T09:41:24.4864243Z dm_mod                188416  0
2025-12-04T09:41:24.4864544Z fuse                  184320  1
2025-12-04T09:41:24.4864835Z configfs               57344  1
2025-12-04T09:41:24.4865143Z loop                   36864  0
2025-12-04T09:41:24.4865451Z dmi_sysfs              20480  0
2025-12-04T09:41:24.4865752Z crc32_pclmul           16384  0
2025-12-04T09:41:24.4866066Z crc32c_intel           24576  0
2025-12-04T09:41:24.4866382Z efivarfs               24576  1
2025-12-04T09:41:24.4866690Z + modinfo nvidia
2025-12-04T09:41:24.4868117Z filename:       /lib/modules/6.1.150-174.273.amzn2023.x86_64/kernel/drivers/video/nvidia.ko
2025-12-04T09:41:24.4868731Z firmware:       nvidia/525.105.17/gsp_tu10x.bin
2025-12-04T09:41:24.4869160Z firmware:       nvidia/525.105.17/gsp_ad10x.bin
2025-12-04T09:41:24.4869549Z alias:          char-major-195-*
2025-12-04T09:41:24.4869881Z version:        525.105.17
2025-12-04T09:41:24.4870190Z supported:      external
2025-12-04T09:41:24.4870475Z license:        NVIDIA
2025-12-04T09:41:24.4870776Z srcversion:     98F82D76E0EF3952EEE57A7
2025-12-04T09:41:24.4871172Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2025-12-04T09:41:24.4871598Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2025-12-04T09:41:24.4872005Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2025-12-04T09:41:24.4872498Z depends:        i2c-core,drm
2025-12-04T09:41:24.4872831Z retpoline:      Y
2025-12-04T09:41:24.4873091Z name:           nvidia
2025-12-04T09:41:24.4873546Z vermagic:       6.1.150-174.273.amzn2023.x86_64 SMP preempt mod_unload modversions 
2025-12-04T09:41:24.4874146Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2025-12-04T09:41:24.4874708Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2025-12-04T09:41:24.4875222Z parm:           NVreg_ResmanDebugLevel:int
2025-12-04T09:41:24.4875610Z parm:           NVreg_RmLogonRC:int
2025-12-04T09:41:24.4875983Z parm:           NVreg_ModifyDeviceFiles:int
2025-12-04T09:41:24.4876361Z parm:           NVreg_DeviceFileUID:int
2025-12-04T09:41:24.4876741Z parm:           NVreg_DeviceFileGID:int
2025-12-04T09:41:24.4877120Z parm:           NVreg_DeviceFileMode:int
2025-12-04T09:41:24.4877554Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2025-12-04T09:41:24.4878034Z parm:           NVreg_UsePageAttributeTable:int
2025-12-04T09:41:24.4878450Z parm:           NVreg_EnablePCIeGen3:int
2025-12-04T09:41:24.4878897Z parm:           NVreg_EnableMSI:int
2025-12-04T09:41:24.4879263Z parm:           NVreg_TCEBypassMode:int
2025-12-04T09:41:24.4879661Z parm:           NVreg_EnableStreamMemOPs:int
2025-12-04T09:41:24.4880114Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2025-12-04T09:41:24.4880590Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2025-12-04T09:41:24.4881065Z parm:           NVreg_EnableS0ixPowerManagement:int
2025-12-04T09:41:24.4881577Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2025-12-04T09:41:24.4882070Z parm:           NVreg_DynamicPowerManagement:int
2025-12-04T09:41:24.4882659Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2025-12-04T09:41:24.4883169Z parm:           NVreg_EnableGpuFirmware:int
2025-12-04T09:41:24.4883575Z parm:           NVreg_EnableGpuFirmwareLogs:int
2025-12-04T09:41:24.4884100Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2025-12-04T09:41:24.4884559Z parm:           NVreg_EnableUserNUMAManagement:int
2025-12-04T09:41:24.4884986Z parm:           NVreg_MemoryPoolSize:int
2025-12-04T09:41:24.4885367Z parm:           NVreg_KMallocHeapMaxSize:int
2025-12-04T09:41:24.4885775Z parm:           NVreg_VMallocHeapMaxSize:int
2025-12-04T09:41:24.4886178Z parm:           NVreg_IgnoreMMIOCheck:int
2025-12-04T09:41:24.4886555Z parm:           NVreg_NvLinkDisable:int
2025-12-04T09:41:24.4886985Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2025-12-04T09:41:24.4887429Z parm:           NVreg_RegisterPCIDriver:int
2025-12-04T09:41:24.4887830Z parm:           NVreg_EnableDbgBreakpoint:int
2025-12-04T09:41:24.4888244Z parm:           NVreg_RegistryDwords:charp
2025-12-04T09:41:24.4888668Z parm:           NVreg_RegistryDwordsPerDevice:charp
2025-12-04T09:41:24.4889084Z parm:           NVreg_RmMsg:charp
2025-12-04T09:41:24.4889433Z parm:           NVreg_GpuBlacklist:charp
2025-12-04T09:41:24.4889840Z parm:           NVreg_TemporaryFilePath:charp
2025-12-04T09:41:24.4890243Z parm:           NVreg_ExcludedGpus:charp
2025-12-04T09:41:24.4890625Z parm:           NVreg_DmaRemapPeerMmio:int
2025-12-04T09:41:24.4891012Z parm:           rm_firmware_active:charp
2025-12-04T09:41:24.4891359Z + set +e
2025-12-04T09:41:24.4891580Z + nvidia-smi
2025-12-04T09:41:26.4660911Z Thu Dec  4 09:41:26 2025       
2025-12-04T09:41:26.4661457Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:26.4662089Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:41:26.4662685Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:41:26.4663274Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:41:26.4663930Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:41:26.4664464Z |                               |                      |               MIG M. |
2025-12-04T09:41:26.4665178Z |===============================+======================+======================|
2025-12-04T09:41:26.4739958Z |   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:41:26.4740512Z | N/A   25C    P0    27W /  70W |      2MiB / 15360MiB |      4%      Default |
2025-12-04T09:41:26.4741071Z |                               |                      |                  N/A |
2025-12-04T09:41:26.4741525Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:41:26.4742000Z                                                                                
2025-12-04T09:41:26.4742460Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:26.4742967Z | Processes:                                                                  |
2025-12-04T09:41:26.4743481Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:41:26.4743985Z |        ID   ID                                                   Usage      |
2025-12-04T09:41:26.4744608Z |=============================================================================|
2025-12-04T09:41:26.4745133Z |  No running processes found                                                 |
2025-12-04T09:41:26.4745687Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:26.9276738Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T09:41:28.8952110Z Tesla T4
2025-12-04T09:41:29.2969546Z + NVIDIA_SMI_STATUS=0
2025-12-04T09:41:29.2969906Z + '[' 0 -eq 0 ']'
2025-12-04T09:41:29.2970193Z + echo 'INFO: Ignoring allowed status 0'
2025-12-04T09:41:29.2970554Z + set -e
2025-12-04T09:41:29.2970809Z INFO: Ignoring allowed status 0
2025-12-04T09:41:29.2977034Z == Installing nvidia container toolkit for amzn2023 ==
2025-12-04T09:41:29.2981205Z + sudo yum install -y yum-utils
2025-12-04T09:41:29.8140999Z Last metadata expiration check: 0:23:40 ago on Thu Dec  4 09:17:49 2025.
2025-12-04T09:41:29.8470924Z Package dnf-utils-4.3.0-13.amzn2023.0.5.noarch is already installed.
2025-12-04T09:41:29.9081594Z Dependencies resolved.
2025-12-04T09:41:29.9418563Z Nothing to do.
2025-12-04T09:41:29.9419845Z Complete!
2025-12-04T09:41:30.0551755Z + [[ amzn2023 == \a\m\z\n\2\0\2\3 ]]
2025-12-04T09:41:30.0552527Z + YUM_REPO_URL=https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:30.0553652Z + sudo yum-config-manager --add-repo https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:30.4319172Z Adding repo from: https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo
2025-12-04T09:41:30.4911181Z + sudo yum install -y nvidia-container-toolkit-1.17.8 libnvidia-container-tools-1.17.8 libnvidia-container1-1.17.8 nvidia-container-toolkit-base-1.17.8
2025-12-04T09:41:31.1450748Z nvidia-container-toolkit                         19 kB/s | 833  B     00:00    
2025-12-04T09:41:31.2485703Z Dependencies resolved.
2025-12-04T09:41:31.2822789Z ================================================================================
2025-12-04T09:41:31.2823329Z  Package                       Arch   Version    Repository                Size
2025-12-04T09:41:31.2823809Z ================================================================================
2025-12-04T09:41:31.2824179Z Downgrading:
2025-12-04T09:41:31.2824641Z  libnvidia-container-tools     x86_64 1.17.8-1   nvidia-container-toolkit  40 k
2025-12-04T09:41:31.2825361Z  libnvidia-container1          x86_64 1.17.8-1   nvidia-container-toolkit 1.0 M
2025-12-04T09:41:31.2826055Z  nvidia-container-toolkit      x86_64 1.17.8-1   nvidia-container-toolkit 1.2 M
2025-12-04T09:41:31.2826804Z  nvidia-container-toolkit-base x86_64 1.17.8-1   nvidia-container-toolkit 5.8 M
2025-12-04T09:41:31.2827266Z 
2025-12-04T09:41:31.2827371Z Transaction Summary
2025-12-04T09:41:31.2827673Z ================================================================================
2025-12-04T09:41:31.2828314Z Downgrade  4 Packages
2025-12-04T09:41:31.2828509Z 
2025-12-04T09:41:31.2828636Z Total download size: 8.0 M
2025-12-04T09:41:31.2830158Z Downloading Packages:
2025-12-04T09:41:31.3236570Z (1/4): libnvidia-container-tools-1.17.8-1.x86_6 1.0 MB/s |  40 kB     00:00    
2025-12-04T09:41:31.3814796Z (2/4): libnvidia-container1-1.17.8-1.x86_64.rpm  10 MB/s | 1.0 MB     00:00    
2025-12-04T09:41:31.4351010Z (3/4): nvidia-container-toolkit-1.17.8-1.x86_64 8.2 MB/s | 1.2 MB     00:00    
2025-12-04T09:41:31.5698092Z (4/4): nvidia-container-toolkit-base-1.17.8-1.x  23 MB/s | 5.8 MB     00:00    
2025-12-04T09:41:31.5710379Z --------------------------------------------------------------------------------
2025-12-04T09:41:31.5715508Z Total                                            28 MB/s | 8.0 MB     00:00     
2025-12-04T09:41:31.5719210Z Running transaction check
2025-12-04T09:41:31.5879711Z Transaction check succeeded.
2025-12-04T09:41:31.5880076Z Running transaction test
2025-12-04T09:41:31.6438864Z Transaction test succeeded.
2025-12-04T09:41:31.6443402Z Running transaction
2025-12-04T09:41:32.6546223Z   Preparing        :                                                        1/1 
2025-12-04T09:41:32.8027049Z   Downgrading      : nvidia-container-toolkit-base-1.17.8-1.x86_64          1/8 
2025-12-04T09:41:32.8313633Z   Downgrading      : libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:41:32.9098275Z   Running scriptlet: libnvidia-container1-1.17.8-1.x86_64                   2/8 
2025-12-04T09:41:33.0689705Z   Downgrading      : libnvidia-container-tools-1.17.8-1.x86_64              3/8 
2025-12-04T09:41:33.0995427Z   Downgrading      : nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:41:33.1688647Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               4/8 
2025-12-04T09:41:33.1758470Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:33.1759614Z   Cleanup          : nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:33.2077231Z   Running scriptlet: nvidia-container-toolkit-1.18.1-1.x86_64               5/8 
2025-12-04T09:41:33.2139712Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:33.2140855Z   Cleanup          : libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:33.2520573Z   Running scriptlet: libnvidia-container-tools-1.18.1-1.x86_64              6/8 
2025-12-04T09:41:33.2591447Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:33.2592640Z   Cleanup          : libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:33.2949463Z   Running scriptlet: libnvidia-container1-1.18.1-1.x86_64                   7/8 
2025-12-04T09:41:33.3015457Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:33.3016877Z   Cleanup          : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:33.3424797Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:33.3997043Z   Running scriptlet: nvidia-container-toolkit-1.17.8-1.x86_64               8/8 
2025-12-04T09:41:34.8919455Z   Running scriptlet: nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8 
2025-12-04T09:41:34.8920241Z   Verifying        : libnvidia-container-tools-1.17.8-1.x86_64              1/8 
2025-12-04T09:41:34.8920929Z   Verifying        : libnvidia-container-tools-1.18.1-1.x86_64              2/8 
2025-12-04T09:41:34.8921601Z   Verifying        : libnvidia-container1-1.17.8-1.x86_64                   3/8 
2025-12-04T09:41:34.8922298Z   Verifying        : libnvidia-container1-1.18.1-1.x86_64                   4/8 
2025-12-04T09:41:34.8922969Z   Verifying        : nvidia-container-toolkit-1.17.8-1.x86_64               5/8 
2025-12-04T09:41:34.8923633Z   Verifying        : nvidia-container-toolkit-1.18.1-1.x86_64               6/8 
2025-12-04T09:41:34.8925083Z   Verifying        : nvidia-container-toolkit-base-1.17.8-1.x86_64          7/8 
2025-12-04T09:41:35.0554752Z   Verifying        : nvidia-container-toolkit-base-1.18.1-1.x86_64          8/8================================================================================
2025-12-04T09:41:35.0555479Z WARNING:
2025-12-04T09:41:35.0555789Z   A newer release of "Amazon Linux" is available.
2025-12-04T09:41:35.0556078Z 
2025-12-04T09:41:35.0556201Z   Available Versions:
2025-12-04T09:41:35.0556383Z 
2025-12-04T09:41:35.0556489Z   Version 2023.9.20250929:
2025-12-04T09:41:35.0556881Z     Run the following command to upgrade to 2023.9.20250929:
2025-12-04T09:41:35.0557218Z 
2025-12-04T09:41:35.0557383Z       dnf upgrade --releasever=2023.9.20250929
2025-12-04T09:41:35.0557650Z 
2025-12-04T09:41:35.0557767Z     Release notes:
2025-12-04T09:41:35.0558270Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20250929.html
2025-12-04T09:41:35.0558772Z 
2025-12-04T09:41:35.0558878Z   Version 2023.9.20251014:
2025-12-04T09:41:35.0559504Z     Run the following command to upgrade to 2023.9.20251014:
2025-12-04T09:41:35.0559836Z 
2025-12-04T09:41:35.0559977Z       dnf upgrade --releasever=2023.9.20251014
2025-12-04T09:41:35.0560257Z 
2025-12-04T09:41:35.0560357Z     Release notes:
2025-12-04T09:41:35.0560861Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251014.html
2025-12-04T09:41:35.0561330Z 
2025-12-04T09:41:35.0561451Z   Version 2023.9.20251020:
2025-12-04T09:41:35.0561826Z     Run the following command to upgrade to 2023.9.20251020:
2025-12-04T09:41:35.0562225Z 
2025-12-04T09:41:35.0562366Z       dnf upgrade --releasever=2023.9.20251020
2025-12-04T09:41:35.0562630Z 
2025-12-04T09:41:35.0562748Z     Release notes:
2025-12-04T09:41:35.0563232Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251020.html
2025-12-04T09:41:35.0563723Z 
2025-12-04T09:41:35.0563835Z   Version 2023.9.20251027:
2025-12-04T09:41:35.0564223Z     Run the following command to upgrade to 2023.9.20251027:
2025-12-04T09:41:35.0564540Z 
2025-12-04T09:41:35.0564694Z       dnf upgrade --releasever=2023.9.20251027
2025-12-04T09:41:35.0564956Z 
2025-12-04T09:41:35.0565059Z     Release notes:
2025-12-04T09:41:35.0565550Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251027.html
2025-12-04T09:41:35.0566022Z 
2025-12-04T09:41:35.0566140Z   Version 2023.9.20251105:
2025-12-04T09:41:35.0566518Z     Run the following command to upgrade to 2023.9.20251105:
2025-12-04T09:41:35.0566835Z 
2025-12-04T09:41:35.0566972Z       dnf upgrade --releasever=2023.9.20251105
2025-12-04T09:41:35.0567243Z 
2025-12-04T09:41:35.0567339Z     Release notes:
2025-12-04T09:41:35.0567826Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251105.html
2025-12-04T09:41:35.0568291Z 
2025-12-04T09:41:35.0568394Z   Version 2023.9.20251110:
2025-12-04T09:41:35.0568772Z     Run the following command to upgrade to 2023.9.20251110:
2025-12-04T09:41:35.0569104Z 
2025-12-04T09:41:35.0569239Z       dnf upgrade --releasever=2023.9.20251110
2025-12-04T09:41:35.0569502Z 
2025-12-04T09:41:35.0569612Z     Release notes:
2025-12-04T09:41:35.0570090Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251110.html
2025-12-04T09:41:35.0570574Z 
2025-12-04T09:41:35.0570676Z   Version 2023.9.20251117:
2025-12-04T09:41:35.0571057Z     Run the following command to upgrade to 2023.9.20251117:
2025-12-04T09:41:35.0571370Z 
2025-12-04T09:41:35.0571503Z       dnf upgrade --releasever=2023.9.20251117
2025-12-04T09:41:35.0571778Z 
2025-12-04T09:41:35.0571877Z     Release notes:
2025-12-04T09:41:35.0572368Z      https://docs.aws.amazon.com/linux/al2023/release-notes/relnotes-2023.9.20251117.html
2025-12-04T09:41:35.0572831Z 
2025-12-04T09:41:35.0572979Z ================================================================================
2025-12-04T09:41:35.1239767Z  
2025-12-04T09:41:35.1240198Z 
2025-12-04T09:41:35.1240296Z Downgraded:
2025-12-04T09:41:35.1240755Z   libnvidia-container-tools-1.17.8-1.x86_64                                     
2025-12-04T09:41:35.1241479Z   libnvidia-container1-1.17.8-1.x86_64                                          
2025-12-04T09:41:35.1242214Z   nvidia-container-toolkit-1.17.8-1.x86_64                                      
2025-12-04T09:41:35.1242951Z   nvidia-container-toolkit-base-1.17.8-1.x86_64                                 
2025-12-04T09:41:35.1243385Z 
2025-12-04T09:41:35.1243494Z Complete!
2025-12-04T09:41:35.1814760Z + sudo systemctl restart docker
2025-12-04T09:41:41.1111468Z Thu Dec  4 09:41:41 2025       
2025-12-04T09:41:41.1111969Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:41.1112576Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:41:41.1113170Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:41:41.1113812Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:41:41.1114753Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:41:41.1115286Z |                               |                      |               MIG M. |
2025-12-04T09:41:41.1115693Z |===============================+======================+======================|
2025-12-04T09:41:41.1210757Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:41:41.1211359Z | N/A   25C    P0    27W /  70W |      2MiB / 15360MiB |      7%      Default |
2025-12-04T09:41:41.1211827Z |                               |                      |                  N/A |
2025-12-04T09:41:41.1212295Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:41:41.1212777Z                                                                                
2025-12-04T09:41:41.1213353Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:41.1213959Z | Processes:                                                                  |
2025-12-04T09:41:41.1214511Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:41:41.1214996Z |        ID   ID                                                   Usage      |
2025-12-04T09:41:41.1215419Z |=============================================================================|
2025-12-04T09:41:41.1215938Z |  No running processes found                                                 |
2025-12-04T09:41:41.1216507Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:41.1946955Z Unable to find image 'public.ecr.aws/docker/library/python:3.13' locally
2025-12-04T09:41:41.4215879Z 3.13: Pulling from docker/library/python
2025-12-04T09:41:41.5196518Z 53c88f1dfeb7: Pulling fs layer
2025-12-04T09:41:41.5197003Z eae668646f44: Pulling fs layer
2025-12-04T09:41:41.5197377Z ff2e6e687b6c: Pulling fs layer
2025-12-04T09:41:41.5197764Z 7c40a3faff76: Pulling fs layer
2025-12-04T09:41:41.5198097Z 967a3b1c8fef: Pulling fs layer
2025-12-04T09:41:41.5198435Z a64e1a44f22a: Pulling fs layer
2025-12-04T09:41:41.5198748Z 52655f8a5bcc: Pulling fs layer
2025-12-04T09:41:41.5199161Z 7c40a3faff76: Waiting
2025-12-04T09:41:41.5199471Z 967a3b1c8fef: Waiting
2025-12-04T09:41:41.5199735Z a64e1a44f22a: Waiting
2025-12-04T09:41:41.5200003Z 52655f8a5bcc: Waiting
2025-12-04T09:41:41.7085887Z eae668646f44: Verifying Checksum
2025-12-04T09:41:41.7086523Z eae668646f44: Download complete
2025-12-04T09:41:41.8051150Z 53c88f1dfeb7: Verifying Checksum
2025-12-04T09:41:41.8051593Z 53c88f1dfeb7: Download complete
2025-12-04T09:41:41.8872854Z 967a3b1c8fef: Verifying Checksum
2025-12-04T09:41:41.8873250Z 967a3b1c8fef: Download complete
2025-12-04T09:41:41.9177237Z ff2e6e687b6c: Verifying Checksum
2025-12-04T09:41:41.9177668Z ff2e6e687b6c: Download complete
2025-12-04T09:41:41.9754487Z 52655f8a5bcc: Verifying Checksum
2025-12-04T09:41:41.9755300Z 52655f8a5bcc: Download complete
2025-12-04T09:41:42.0865636Z a64e1a44f22a: Verifying Checksum
2025-12-04T09:41:42.0866107Z a64e1a44f22a: Download complete
2025-12-04T09:41:42.8753600Z 7c40a3faff76: Verifying Checksum
2025-12-04T09:41:42.8754030Z 7c40a3faff76: Download complete
2025-12-04T09:41:43.2858492Z 53c88f1dfeb7: Pull complete
2025-12-04T09:41:43.8822860Z eae668646f44: Pull complete
2025-12-04T09:41:45.8967448Z ff2e6e687b6c: Pull complete
2025-12-04T09:41:51.6887848Z 7c40a3faff76: Pull complete
2025-12-04T09:41:51.9209803Z 967a3b1c8fef: Pull complete
2025-12-04T09:41:52.5760960Z a64e1a44f22a: Pull complete
2025-12-04T09:41:52.5975061Z 52655f8a5bcc: Pull complete
2025-12-04T09:41:52.6105395Z Digest: sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T09:41:52.6145981Z Status: Downloaded newer image for public.ecr.aws/docker/library/python:3.13
2025-12-04T09:41:59.8173066Z Thu Dec  4 09:41:59 2025       
2025-12-04T09:41:59.8173548Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:59.8174449Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T09:41:59.8175055Z |-------------------------------+----------------------+----------------------+
2025-12-04T09:41:59.8175659Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T09:41:59.8176303Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T09:41:59.8176839Z |                               |                      |               MIG M. |
2025-12-04T09:41:59.8177247Z |===============================+======================+======================|
2025-12-04T09:41:59.8327708Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T09:41:59.8328247Z | N/A   25C    P8    11W /  70W |      2MiB / 15360MiB |      0%      Default |
2025-12-04T09:41:59.8328735Z |                               |                      |                  N/A |
2025-12-04T09:41:59.8329265Z +-------------------------------+----------------------+----------------------+
2025-12-04T09:41:59.8329766Z                                                                                
2025-12-04T09:41:59.8330225Z +-----------------------------------------------------------------------------+
2025-12-04T09:41:59.8330871Z | Processes:                                                                  |
2025-12-04T09:41:59.8331435Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T09:41:59.8331910Z |        ID   ID                                                   Usage      |
2025-12-04T09:41:59.8332346Z |=============================================================================|
2025-12-04T09:41:59.8332870Z |  No running processes found                                                 |
2025-12-04T09:41:59.8333431Z +-----------------------------------------------------------------------------+
2025-12-04T09:42:01.3324228Z Command completed after 1 attempt(s).
2025-12-04T09:42:01.3423773Z Prepare all required actions
2025-12-04T09:42:01.3458419Z ##[group]Run ./.github/actions/get-workflow-job-id
2025-12-04T09:42:01.3458814Z with:
2025-12-04T09:42:01.3459515Z   github-token: ***
2025-12-04T09:42:01.3459791Z env:
2025-12-04T09:42:01.3460021Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:01.3460342Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:01.3460717Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:01.3461139Z ##[endgroup]
2025-12-04T09:42:01.3477144Z ##[group]Run set -eux
2025-12-04T09:42:01.3477450Z [36;1mset -eux[0m
2025-12-04T09:42:01.3477977Z [36;1mpython3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}"[0m
2025-12-04T09:42:01.3489938Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:01.3490372Z env:
2025-12-04T09:42:01.3490621Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:01.3490931Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:01.3491506Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:01.3492118Z   GITHUB_TOKEN: ***
2025-12-04T09:42:01.3492375Z ##[endgroup]
2025-12-04T09:42:01.3527785Z + python3 .github/scripts/get_workflow_job_id.py 19922826259 i-03bbda7791efb68ed
2025-12-04T09:42:03.3928672Z Setting output job-id=57119749427
2025-12-04T09:42:03.3930479Z Setting output job-name=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:03.4062013Z ##[group]Run python3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84
2025-12-04T09:42:03.4062910Z [36;1mpython3 -m pip install psutil==5.9.8 dataclasses_json==0.6.7 nvidia-ml-py==11.525.84[0m
2025-12-04T09:42:03.4064065Z [36;1mpython3 -m tools.stats.monitor --log-interval "$MONITOR_LOG_INTERVAL" --data-collect-interval "$MONITOR_DATA_COLLECT_INTERVAL" > usage_log.txt 2>&1 &[0m
2025-12-04T09:42:03.4065084Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:42:03.4071854Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:03.4072291Z env:
2025-12-04T09:42:03.4072549Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:03.4072869Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:03.4073228Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:03.4073652Z   JOB_ID: 57119749427
2025-12-04T09:42:03.4074419Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:03.4075223Z   WORKFLOW_NAME: periodic
2025-12-04T09:42:03.4075543Z   WORKFLOW_RUN_ID: 19922826259
2025-12-04T09:42:03.4075870Z   MONITOR_LOG_INTERVAL: 5
2025-12-04T09:42:03.4076173Z   MONITOR_DATA_COLLECT_INTERVAL: 1
2025-12-04T09:42:03.4076513Z ##[endgroup]
2025-12-04T09:42:03.7321083Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:42:04.1464489Z Collecting psutil==5.9.8
2025-12-04T09:42:04.1659122Z   Downloading psutil-5.9.8-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (288 kB)
2025-12-04T09:42:04.2507907Z Collecting dataclasses_json==0.6.7
2025-12-04T09:42:04.2552969Z   Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
2025-12-04T09:42:04.2860830Z Collecting nvidia-ml-py==11.525.84
2025-12-04T09:42:04.2902558Z   Downloading nvidia_ml_py-11.525.84-py3-none-any.whl (34 kB)
2025-12-04T09:42:04.4228915Z Collecting marshmallow<4.0.0,>=3.18.0
2025-12-04T09:42:04.4271966Z   Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
2025-12-04T09:42:04.4528198Z Collecting typing-inspect<1,>=0.4.0
2025-12-04T09:42:04.4568199Z   Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
2025-12-04T09:42:04.5184047Z Collecting packaging>=17.0
2025-12-04T09:42:04.5225174Z   Downloading packaging-25.0-py3-none-any.whl (66 kB)
2025-12-04T09:42:04.5498068Z Collecting mypy-extensions>=0.3.0
2025-12-04T09:42:04.5537748Z   Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
2025-12-04T09:42:04.6071953Z Collecting typing-extensions>=3.7.4
2025-12-04T09:42:04.6111627Z   Downloading typing_extensions-4.15.0-py3-none-any.whl (44 kB)
2025-12-04T09:42:04.7156979Z Installing collected packages: typing-extensions, packaging, mypy-extensions, typing-inspect, marshmallow, psutil, nvidia-ml-py, dataclasses-json
2025-12-04T09:42:05.0330872Z Successfully installed dataclasses-json-0.6.7 marshmallow-3.26.1 mypy-extensions-1.1.0 nvidia-ml-py-11.525.84 packaging-25.0 psutil-5.9.8 typing-extensions-4.15.0 typing-inspect-0.9.0
2025-12-04T09:42:05.2379579Z Prepare all required actions
2025-12-04T09:42:05.2380046Z Getting action download info
2025-12-04T09:42:05.4096234Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:1da556a7aa0a088e3153970611f6c432d58e80e6)
2025-12-04T09:42:05.6495293Z Download action repository 'actions/download-artifact@v4' (SHA:d3f86a106a0bac45b974a628896c90dbdf5c8093)
2025-12-04T09:42:06.0164772Z ##[group]Run ./.github/actions/download-build-artifacts
2025-12-04T09:42:06.0165358Z with:
2025-12-04T09:42:06.0165693Z   name: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:06.0166088Z   s3-bucket: gha-artifacts
2025-12-04T09:42:06.0166379Z env:
2025-12-04T09:42:06.0166626Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:06.0166941Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:06.0167314Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:06.0167718Z ##[endgroup]
2025-12-04T09:42:06.0200133Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:42:06.0200536Z with:
2025-12-04T09:42:06.0201139Z   name: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:06.0201520Z   s3-bucket: gha-artifacts
2025-12-04T09:42:06.0201829Z   region: us-east-1
2025-12-04T09:42:06.0202213Z env:
2025-12-04T09:42:06.0202448Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:06.0202756Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:06.0203131Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:06.0203536Z ##[endgroup]
2025-12-04T09:42:06.5658141Z (node:68884) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:42:06.5658740Z 
2025-12-04T09:42:06.5658979Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:42:06.5659624Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:42:06.5660291Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:42:06.8644352Z Found 1 objects with prefix pytorch/pytorch/19922826259/linux-jammy-cuda12.4-py3.10-gcc11/
2025-12-04T09:42:06.8645240Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:42:14.9231127Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2025-12-04T09:42:14.9237791Z Artifact download has finished successfully
2025-12-04T09:42:14.9440238Z ##[group]Run unzip -o artifacts.zip
2025-12-04T09:42:14.9440625Z [36;1munzip -o artifacts.zip[0m
2025-12-04T09:42:14.9448066Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:14.9448513Z env:
2025-12-04T09:42:14.9448746Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:14.9449057Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:14.9449423Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:14.9449828Z ##[endgroup]
2025-12-04T09:42:14.9521289Z Archive:  artifacts.zip
2025-12-04T09:42:14.9522938Z    creating: dist/
2025-12-04T09:42:16.9691753Z   inflating: dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl  
2025-12-04T09:42:16.9835582Z   inflating: dist/.ninja_log         
2025-12-04T09:42:16.9836414Z    creating: build/custom_test_artifacts/
2025-12-04T09:42:16.9836940Z    creating: build/custom_test_artifacts/custom-op-build/
2025-12-04T09:42:16.9837510Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2025-12-04T09:42:16.9838217Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:16.9846110Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:16.9846949Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/
2025-12-04T09:42:16.9847728Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:16.9848584Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:16.9849417Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:16.9850818Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:16.9852093Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:16.9853008Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:16.9853893Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:16.9854926Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:16.9856415Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:16.9857918Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:16.9859001Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:16.9860662Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:16.9862641Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:16.9863617Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:16.9864486Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:16.9926300Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:16.9990772Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:16.9992078Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:17.0059707Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:17.0060954Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:17.0062232Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:17.0063539Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:17.0064811Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:17.0066041Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:17.0067278Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:17.0068505Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:17.0069716Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:17.0070857Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:17.0071948Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:17.0073033Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:17.0074117Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:17.0075168Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:17.0076462Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:17.0154305Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:17.0155671Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:17.0237359Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:17.0238604Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:17.0239345Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:17.0240105Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:17.0240921Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2025-12-04T09:42:17.0241829Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2025-12-04T09:42:17.0242888Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2025-12-04T09:42:17.0243862Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2025-12-04T09:42:17.0244783Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2025-12-04T09:42:17.0245719Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:42:17.0246644Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2025-12-04T09:42:17.0247588Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2025-12-04T09:42:17.0248527Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2025-12-04T09:42:17.0249453Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2025-12-04T09:42:17.0267186Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2025-12-04T09:42:17.0485771Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2025-12-04T09:42:17.0486655Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2025-12-04T09:42:17.0487635Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2025-12-04T09:42:17.0488708Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2025-12-04T09:42:17.0489725Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2025-12-04T09:42:17.0490684Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2025-12-04T09:42:17.0491683Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2025-12-04T09:42:17.0492684Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2025-12-04T09:42:17.0493662Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2025-12-04T09:42:17.0494659Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2025-12-04T09:42:17.0495641Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2025-12-04T09:42:17.0514624Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2025-12-04T09:42:17.0604307Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2025-12-04T09:42:17.0605590Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:17.0606540Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:17.0607391Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2025-12-04T09:42:17.0608163Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2025-12-04T09:42:17.0608934Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:17.0609852Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2025-12-04T09:42:17.0612237Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2025-12-04T09:42:17.0613077Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2025-12-04T09:42:17.0613769Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2025-12-04T09:42:17.0804944Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2025-12-04T09:42:17.0867255Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2025-12-04T09:42:17.0867859Z    creating: build/custom_test_artifacts/jit-hook-build/
2025-12-04T09:42:17.0868426Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2025-12-04T09:42:17.0869111Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:17.0876621Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:17.0877411Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/
2025-12-04T09:42:17.0878188Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:17.0879027Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:17.0879839Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:17.0880900Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:17.0882463Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:17.0883376Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:17.0884239Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:17.0885065Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:17.0887011Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:17.0888479Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:17.0889529Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:17.0891148Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:17.0893258Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:17.0894213Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:17.0895063Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:17.0956822Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:17.1021154Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:17.1022426Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:17.1090206Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:17.1091445Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:17.1092704Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:17.1093977Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:17.1095331Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:17.1096554Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:17.1097791Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:17.1098994Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:17.1100192Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:17.1101488Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:17.1102584Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:17.1103648Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:17.1104726Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:17.1105776Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:17.1106839Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:17.1184913Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:17.1185856Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:17.1267767Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:17.1268712Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:17.1269425Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:17.1270187Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:17.1270998Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2025-12-04T09:42:17.1271925Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2025-12-04T09:42:17.1272961Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2025-12-04T09:42:17.1273957Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2025-12-04T09:42:17.1274877Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2025-12-04T09:42:17.1275840Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2025-12-04T09:42:17.1276807Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2025-12-04T09:42:17.1277775Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2025-12-04T09:42:17.1278749Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2025-12-04T09:42:17.1279909Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2025-12-04T09:42:17.1297597Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2025-12-04T09:42:17.1367619Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2025-12-04T09:42:17.1368650Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:17.1369769Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:17.1370589Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2025-12-04T09:42:17.1371361Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2025-12-04T09:42:17.1372115Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:17.1372879Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2025-12-04T09:42:17.1375416Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2025-12-04T09:42:17.1376253Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2025-12-04T09:42:17.1376942Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2025-12-04T09:42:17.1420659Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2025-12-04T09:42:17.1421296Z    creating: build/custom_test_artifacts/custom-backend-build/
2025-12-04T09:42:17.1421915Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2025-12-04T09:42:17.1422667Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/pkgRedirects/
2025-12-04T09:42:17.1430070Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeConfigureLog.yaml  
2025-12-04T09:42:17.1430930Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/
2025-12-04T09:42:17.1431771Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeSystem.cmake  
2025-12-04T09:42:17.1432687Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/
2025-12-04T09:42:17.1433580Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/tmp/
2025-12-04T09:42:17.1434602Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/CMakeCCompilerId.c  
2025-12-04T09:42:17.1435871Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdC/a.out  
2025-12-04T09:42:17.1436854Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCCompiler.cmake  
2025-12-04T09:42:17.1437796Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/
2025-12-04T09:42:17.1438712Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/tmp/
2025-12-04T09:42:17.1440302Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2025-12-04T09:42:17.1441810Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCXX/a.out  
2025-12-04T09:42:17.1442968Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCXXCompiler.cmake  
2025-12-04T09:42:17.1444642Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_C.bin  
2025-12-04T09:42:17.1446575Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CXX.bin  
2025-12-04T09:42:17.1447617Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/
2025-12-04T09:42:17.1448539Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/
2025-12-04T09:42:17.1510953Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2025-12-04T09:42:17.1574984Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2025-12-04T09:42:17.1576323Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2025-12-04T09:42:17.1643950Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2025-12-04T09:42:17.1645429Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2025-12-04T09:42:17.1646775Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2025-12-04T09:42:17.1648136Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2025-12-04T09:42:17.1649478Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2025-12-04T09:42:17.1650772Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2025-12-04T09:42:17.1652087Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2025-12-04T09:42:17.1653392Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2025-12-04T09:42:17.1654650Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2025-12-04T09:42:17.1655849Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2025-12-04T09:42:17.1657023Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.reg.c  
2025-12-04T09:42:17.1658165Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin  
2025-12-04T09:42:17.1659309Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2025-12-04T09:42:17.1660425Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/tmp/a_dlink.o  
2025-12-04T09:42:17.1661573Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2025-12-04T09:42:17.1738683Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CompilerIdCUDA/a.out  
2025-12-04T09:42:17.1739686Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeCUDACompiler.cmake  
2025-12-04T09:42:17.1821047Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.31.6/CMakeDetermineCompilerABI_CUDA.bin  
2025-12-04T09:42:17.1822056Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeScratch/
2025-12-04T09:42:17.1822851Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2025-12-04T09:42:17.1823666Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2025-12-04T09:42:17.1824539Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2025-12-04T09:42:17.1825533Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2025-12-04T09:42:17.1826676Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2025-12-04T09:42:17.1827752Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2025-12-04T09:42:17.1828761Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2025-12-04T09:42:17.1830020Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:42:17.1831082Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2025-12-04T09:42:17.1832125Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2025-12-04T09:42:17.1833179Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2025-12-04T09:42:17.1834304Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2025-12-04T09:42:17.1835420Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2025-12-04T09:42:17.1965428Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2025-12-04T09:42:17.1966479Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2025-12-04T09:42:17.1967534Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2025-12-04T09:42:17.1968721Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2025-12-04T09:42:17.1969871Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2025-12-04T09:42:17.1970936Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2025-12-04T09:42:17.1972041Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2025-12-04T09:42:17.1973168Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2025-12-04T09:42:17.1974287Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2025-12-04T09:42:17.1975382Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2025-12-04T09:42:17.1976473Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2025-12-04T09:42:17.1994479Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2025-12-04T09:42:17.2055292Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2025-12-04T09:42:17.2056439Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2025-12-04T09:42:17.2057456Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2025-12-04T09:42:17.2058368Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2025-12-04T09:42:17.2059209Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2025-12-04T09:42:17.2060033Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2025-12-04T09:42:17.2060850Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2025-12-04T09:42:17.2063210Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2025-12-04T09:42:17.2064737Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2025-12-04T09:42:17.2065469Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2025-12-04T09:42:17.2177681Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2025-12-04T09:42:17.2221556Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2025-12-04T09:42:17.2222136Z    creating: build/lib/
2025-12-04T09:42:17.2312648Z   inflating: build/lib/libprotobuf-lite.a  
2025-12-04T09:42:17.2800316Z   inflating: build/lib/libprotobuf.a  
2025-12-04T09:42:17.3346555Z   inflating: build/lib/libprotoc.a   
2025-12-04T09:42:17.3357607Z   inflating: build/lib/libpthreadpool.a  
2025-12-04T09:42:17.3366844Z   inflating: build/lib/libcpuinfo.a  
2025-12-04T09:42:17.3375463Z   inflating: build/lib/libcpuinfo_internals.a  
2025-12-04T09:42:17.3376435Z   inflating: build/lib/libclog.a     
2025-12-04T09:42:17.3397451Z   inflating: build/lib/libpytorch_qnnpack.a  
2025-12-04T09:42:17.3400150Z   inflating: build/lib/libnnpack_reference_layers.a  
2025-12-04T09:42:17.3420062Z   inflating: build/lib/libnnpack.a   
2025-12-04T09:42:17.3626055Z   inflating: build/lib/libmicrokernels-prod.a  
2025-12-04T09:42:17.4596709Z   inflating: build/lib/libmicrokernels-all.a  
2025-12-04T09:42:17.4673395Z   inflating: build/lib/libgtest.a    
2025-12-04T09:42:17.4692704Z   inflating: build/lib/libgmock.a    
2025-12-04T09:42:17.4693534Z   inflating: build/lib/libgtest_main.a  
2025-12-04T09:42:17.4694387Z   inflating: build/lib/libgmock_main.a  
2025-12-04T09:42:17.4794305Z   inflating: build/lib/libXNNPACK.a  
2025-12-04T09:42:17.4877682Z   inflating: build/lib/libbenchmark.a  
2025-12-04T09:42:17.4878535Z   inflating: build/lib/libbenchmark_main.a  
2025-12-04T09:42:17.4887506Z   inflating: build/lib/libittnotify.a  
2025-12-04T09:42:17.4960545Z   inflating: build/lib/libasmjit.a   
2025-12-04T09:42:17.4961448Z   inflating: build/lib/libjitprofiling.a  
2025-12-04T09:42:17.6242344Z   inflating: build/lib/libfbgemm.a   
2025-12-04T09:42:17.6276421Z   inflating: build/lib/libtensorpipe_uv.a  
2025-12-04T09:42:17.6872655Z   inflating: build/lib/libtensorpipe.a  
2025-12-04T09:42:17.7140793Z   inflating: build/lib/libtensorpipe_cuda.a  
2025-12-04T09:42:17.7289516Z   inflating: build/lib/libgloo.a     
2025-12-04T09:42:17.7341853Z   inflating: build/lib/libonnx_proto.a  
2025-12-04T09:42:17.7812485Z   inflating: build/lib/libgloo_cuda.a  
2025-12-04T09:42:17.8596908Z   inflating: build/lib/libonnx.a     
2025-12-04T09:42:18.9685725Z   inflating: build/lib/libdnnl.a     
2025-12-04T09:42:18.9707412Z   inflating: build/lib/libfmt.a      
2025-12-04T09:42:19.0236826Z   inflating: build/lib/libkineto.a   
2025-12-04T09:42:19.0366057Z   inflating: build/lib/libc10.so     
2025-12-04T09:42:19.0421088Z   inflating: build/lib/libc10_cuda.so  
2025-12-04T09:42:19.0422726Z   inflating: build/lib/libtorch_global_deps.so  
2025-12-04T09:42:19.0424680Z   inflating: build/lib/libcaffe2_nvrtc.so  
2025-12-04T09:42:22.4588513Z   inflating: build/lib/libtorch_cpu.so  
2025-12-04T09:42:24.2551071Z   inflating: build/lib/libtorch_cuda.so  
2025-12-04T09:42:24.2555707Z   inflating: build/lib/libshm.so     
2025-12-04T09:42:24.2557151Z   inflating: build/lib/libtorch.so   
2025-12-04T09:42:24.2610730Z   inflating: build/lib/libtorch_cuda_linalg.so  
2025-12-04T09:42:24.2613525Z   inflating: build/lib/libc10d_cuda_test.so  
2025-12-04T09:42:24.2691823Z   inflating: build/lib/libtorchbind_test.so  
2025-12-04T09:42:24.2713499Z   inflating: build/lib/libjitbackend_test.so  
2025-12-04T09:42:24.2740003Z   inflating: build/lib/libbackend_with_compiler.so  
2025-12-04T09:42:24.2768936Z   inflating: build/lib/libaoti_custom_ops.so  
2025-12-04T09:42:24.5401263Z   inflating: build/lib/libtorch_python.so  
2025-12-04T09:42:24.5441255Z   inflating: build/lib/libnnapi_backend.so  
2025-12-04T09:42:24.5441649Z    creating: build/bin/
2025-12-04T09:42:24.5951930Z   inflating: build/bin/protoc-3.13.0.0  
2025-12-04T09:42:24.6460796Z   inflating: build/bin/protoc        
2025-12-04T09:42:24.6527311Z   inflating: build/bin/c10_AllocatorConfig_test  
2025-12-04T09:42:24.6589055Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2025-12-04T09:42:24.6652958Z   inflating: build/bin/c10_DeviceGuard_test  
2025-12-04T09:42:24.6716986Z   inflating: build/bin/c10_Device_test  
2025-12-04T09:42:24.6790316Z   inflating: build/bin/c10_DispatchKeySet_test  
2025-12-04T09:42:24.6850877Z   inflating: build/bin/c10_StreamGuard_test  
2025-12-04T09:42:24.6917937Z   inflating: build/bin/c10_Scalar_test  
2025-12-04T09:42:24.6987261Z   inflating: build/bin/c10_SymInt_test  
2025-12-04T09:42:24.7056097Z   inflating: build/bin/c10_InlineStreamGuard_test  
2025-12-04T09:42:24.7123714Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2025-12-04T09:42:24.7192494Z   inflating: build/bin/c10_SizesAndStrides_test  
2025-12-04T09:42:24.7254365Z   inflating: build/bin/c10_ArrayRef_test  
2025-12-04T09:42:24.7315185Z   inflating: build/bin/c10_ConstexprCrc_test  
2025-12-04T09:42:24.7400376Z   inflating: build/bin/c10_cow_test  
2025-12-04T09:42:24.7465631Z   inflating: build/bin/c10_Bitset_test  
2025-12-04T09:42:24.7527632Z   inflating: build/bin/c10_DeadlockDetection_test  
2025-12-04T09:42:24.7597431Z   inflating: build/bin/c10_Enumerate_test  
2025-12-04T09:42:24.7660102Z   inflating: build/bin/c10_Half_test  
2025-12-04T09:42:24.7725515Z   inflating: build/bin/c10_IntrusiveList_test  
2025-12-04T09:42:24.7791025Z   inflating: build/bin/c10_NetworkFlow_test  
2025-12-04T09:42:24.7859659Z   inflating: build/bin/c10_LeftRight_test  
2025-12-04T09:42:24.7921539Z   inflating: build/bin/c10_Synchronized_test  
2025-12-04T09:42:24.7982810Z   inflating: build/bin/c10_Semaphore_test  
2025-12-04T09:42:24.8051320Z   inflating: build/bin/c10_ThreadLocal_test  
2025-12-04T09:42:24.8115255Z   inflating: build/bin/c10_TypeIndex_test  
2025-12-04T09:42:24.8178935Z   inflating: build/bin/c10_accumulate_test  
2025-12-04T09:42:24.8247724Z   inflating: build/bin/c10_bfloat16_test  
2025-12-04T09:42:24.8317182Z   inflating: build/bin/c10_complex_math_test  
2025-12-04T09:42:24.8379362Z   inflating: build/bin/c10_bit_cast_test  
2025-12-04T09:42:24.8440722Z   inflating: build/bin/c10_error_test  
2025-12-04T09:42:24.8509387Z   inflating: build/bin/c10_complex_test  
2025-12-04T09:42:24.8573972Z   inflating: build/bin/c10_exception_test  
2025-12-04T09:42:24.8636253Z   inflating: build/bin/c10_flags_test  
2025-12-04T09:42:24.8698347Z   inflating: build/bin/c10_generic_math_test  
2025-12-04T09:42:24.8882872Z   inflating: build/bin/c10_intrusive_ptr_test  
2025-12-04T09:42:24.8945839Z   inflating: build/bin/c10_irange_test  
2025-12-04T09:42:24.9011829Z   inflating: build/bin/c10_lazy_test  
2025-12-04T09:42:24.9081765Z   inflating: build/bin/c10_logging_test  
2025-12-04T09:42:24.9143603Z   inflating: build/bin/c10_nofatal_test  
2025-12-04T09:42:24.9234051Z   inflating: build/bin/c10_optional_test  
2025-12-04T09:42:24.9309460Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2025-12-04T09:42:24.9375019Z   inflating: build/bin/c10_registry_test  
2025-12-04T09:42:24.9554302Z   inflating: build/bin/c10_small_vector_test  
2025-12-04T09:42:24.9617976Z   inflating: build/bin/c10_ssize_test  
2025-12-04T09:42:24.9687369Z   inflating: build/bin/c10_string_util_test  
2025-12-04T09:42:24.9741672Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2025-12-04T09:42:24.9803831Z   inflating: build/bin/c10_tempfile_test  
2025-12-04T09:42:24.9864238Z   inflating: build/bin/c10_string_view_test  
2025-12-04T09:42:24.9933451Z   inflating: build/bin/c10_typeid_test  
2025-12-04T09:42:24.9998426Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2025-12-04T09:42:25.0063966Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2025-12-04T09:42:25.0128235Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2025-12-04T09:42:25.0192944Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2025-12-04T09:42:25.0254095Z   inflating: build/bin/c10_cuda_CUDATest  
2025-12-04T09:42:25.0319652Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2025-12-04T09:42:25.0384669Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2025-12-04T09:42:25.0449554Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2025-12-04T09:42:25.1117739Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2025-12-04T09:42:25.1805216Z   inflating: build/bin/vec_test_all_types_AVX512  
2025-12-04T09:42:25.2502947Z   inflating: build/bin/vec_test_all_types_AVX2  
2025-12-04T09:42:25.2564136Z   inflating: build/bin/test_vec_half_DEFAULT  
2025-12-04T09:42:25.2680397Z   inflating: build/bin/test_aoti_abi_check  
2025-12-04T09:42:25.2742448Z   inflating: build/bin/test_vec_half_AVX512  
2025-12-04T09:42:25.2804237Z   inflating: build/bin/test_vec_half_AVX2  
2025-12-04T09:42:25.2868953Z   inflating: build/bin/BackoffTest   
2025-12-04T09:42:25.2935118Z   inflating: build/bin/FileStoreTest  
2025-12-04T09:42:25.3004976Z   inflating: build/bin/TCPStoreTest  
2025-12-04T09:42:25.3071291Z   inflating: build/bin/HashStoreTest  
2025-12-04T09:42:25.3087032Z   inflating: build/bin/ProcessGroupMPITest  
2025-12-04T09:42:25.3091309Z   inflating: build/bin/torch_shm_manager  
2025-12-04T09:42:25.3180023Z   inflating: build/bin/Dict_test     
2025-12-04T09:42:25.3244693Z   inflating: build/bin/Dimname_test  
2025-12-04T09:42:25.3323642Z   inflating: build/bin/MaybeOwned_test  
2025-12-04T09:42:25.3393393Z   inflating: build/bin/NamedTensor_test  
2025-12-04T09:42:25.3465375Z   inflating: build/bin/apply_utils_test  
2025-12-04T09:42:25.3537678Z   inflating: build/bin/atest         
2025-12-04T09:42:25.3615805Z   inflating: build/bin/basic         
2025-12-04T09:42:25.3682528Z   inflating: build/bin/broadcast_test  
2025-12-04T09:42:25.3745527Z   inflating: build/bin/cpu_allocator_test  
2025-12-04T09:42:25.3816543Z   inflating: build/bin/cpu_generator_test  
2025-12-04T09:42:25.3881419Z   inflating: build/bin/cpu_profiling_allocator_test  
2025-12-04T09:42:25.3991387Z   inflating: build/bin/cpu_rng_test  
2025-12-04T09:42:25.4054358Z   inflating: build/bin/dlconvertor_test  
2025-12-04T09:42:25.4125021Z   inflating: build/bin/extension_backend_test  
2025-12-04T09:42:25.4193285Z   inflating: build/bin/half_test     
2025-12-04T09:42:25.4309931Z   inflating: build/bin/ivalue_test   
2025-12-04T09:42:25.4371034Z   inflating: build/bin/lazy_tensor_test  
2025-12-04T09:42:25.4436629Z   inflating: build/bin/math_kernel_test  
2025-12-04T09:42:25.4501979Z   inflating: build/bin/memory_format_test  
2025-12-04T09:42:25.4567830Z   inflating: build/bin/memory_overlapping_test  
2025-12-04T09:42:25.4633500Z   inflating: build/bin/mobile_memory_cleanup  
2025-12-04T09:42:25.4702361Z   inflating: build/bin/native_test   
2025-12-04T09:42:25.4765074Z   inflating: build/bin/operator_name_test  
2025-12-04T09:42:25.4827796Z   inflating: build/bin/operators_test  
2025-12-04T09:42:25.4892706Z   inflating: build/bin/packedtensoraccessor_test  
2025-12-04T09:42:25.4974920Z   inflating: build/bin/pow_test      
2025-12-04T09:42:25.5044939Z   inflating: build/bin/quantized_test  
2025-12-04T09:42:25.5106415Z   inflating: build/bin/reduce_ops_test  
2025-12-04T09:42:25.5169382Z   inflating: build/bin/reportMemoryUsage_test  
2025-12-04T09:42:25.5238545Z   inflating: build/bin/scalar_tensor_test  
2025-12-04T09:42:25.5309356Z   inflating: build/bin/scalar_test   
2025-12-04T09:42:25.5373292Z   inflating: build/bin/StorageUtils_test  
2025-12-04T09:42:25.5437630Z   inflating: build/bin/stride_properties_test  
2025-12-04T09:42:25.5533230Z   inflating: build/bin/tensor_iterator_test  
2025-12-04T09:42:25.5600148Z   inflating: build/bin/test_parallel  
2025-12-04T09:42:25.5662750Z   inflating: build/bin/thread_init_test  
2025-12-04T09:42:25.5730735Z   inflating: build/bin/type_ptr_test  
2025-12-04T09:42:25.5803773Z   inflating: build/bin/type_test     
2025-12-04T09:42:25.5868475Z   inflating: build/bin/undefined_tensor_test  
2025-12-04T09:42:25.5930295Z   inflating: build/bin/verify_api_visibility  
2025-12-04T09:42:25.6016171Z   inflating: build/bin/legacy_vmap_test  
2025-12-04T09:42:25.6079475Z   inflating: build/bin/weakref_test  
2025-12-04T09:42:25.6143393Z   inflating: build/bin/wrapdim_test  
2025-12-04T09:42:25.6206530Z   inflating: build/bin/xla_tensor_test  
2025-12-04T09:42:25.6279464Z   inflating: build/bin/IListRef_test  
2025-12-04T09:42:25.6405325Z   inflating: build/bin/List_test     
2025-12-04T09:42:25.6485749Z   inflating: build/bin/KernelFunction_test  
2025-12-04T09:42:25.6628236Z   inflating: build/bin/kernel_function_legacy_test  
2025-12-04T09:42:25.6742122Z   inflating: build/bin/kernel_function_test  
2025-12-04T09:42:25.6891342Z   inflating: build/bin/kernel_lambda_legacy_test  
2025-12-04T09:42:25.7012276Z   inflating: build/bin/kernel_lambda_test  
2025-12-04T09:42:25.7085546Z   inflating: build/bin/kernel_stackbased_test  
2025-12-04T09:42:25.7199628Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2025-12-04T09:42:25.7263066Z   inflating: build/bin/CppSignature_test  
2025-12-04T09:42:25.7330885Z   inflating: build/bin/backend_fallback_test  
2025-12-04T09:42:25.7391764Z   inflating: build/bin/op_allowlist_test  
2025-12-04T09:42:25.7748404Z   inflating: build/bin/op_registration_test  
2025-12-04T09:42:25.7830395Z   inflating: build/bin/inline_container_test  
2025-12-04T09:42:25.7897130Z   inflating: build/bin/cuda_allocator_test  
2025-12-04T09:42:25.7962609Z   inflating: build/bin/cuda_apply_test  
2025-12-04T09:42:25.8035928Z   inflating: build/bin/cuda_atomic_ops_test  
2025-12-04T09:42:25.8105341Z   inflating: build/bin/cuda_caching_host_allocator_test  
2025-12-04T09:42:25.8189827Z   inflating: build/bin/cuda_complex_math_test  
2025-12-04T09:42:25.8263035Z   inflating: build/bin/cuda_complex_test  
2025-12-04T09:42:25.8334760Z   inflating: build/bin/cuda_cub_test  
2025-12-04T09:42:25.8400064Z   inflating: build/bin/cuda_cublas_handle_pool_test  
2025-12-04T09:42:25.8461565Z   inflating: build/bin/cuda_device_test  
2025-12-04T09:42:25.8540660Z   inflating: build/bin/cuda_distributions_test  
2025-12-04T09:42:25.8605095Z   inflating: build/bin/cuda_dlconvertor_test  
2025-12-04T09:42:25.8670975Z   inflating: build/bin/cuda_event_test  
2025-12-04T09:42:25.8732664Z   inflating: build/bin/cuda_exchange_device_test  
2025-12-04T09:42:25.8802409Z   inflating: build/bin/cuda_generator_test  
2025-12-04T09:42:25.8863981Z   inflating: build/bin/cuda_half_test  
2025-12-04T09:42:25.8927200Z   inflating: build/bin/cuda_integer_divider_test  
2025-12-04T09:42:25.8988498Z   inflating: build/bin/cuda_optional_test  
2025-12-04T09:42:25.9052803Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2025-12-04T09:42:25.9117714Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2025-12-04T09:42:25.9179392Z   inflating: build/bin/cuda_allocatorTraceTracker_test  
2025-12-04T09:42:25.9254139Z   inflating: build/bin/cuda_stream_test  
2025-12-04T09:42:25.9319355Z   inflating: build/bin/cuda_vectorized_test  
2025-12-04T09:42:25.9380736Z   inflating: build/bin/cuda_cudnn_test  
2025-12-04T09:42:25.9780393Z   inflating: build/bin/test_lazy     
2025-12-04T09:42:25.9862217Z   inflating: build/bin/ProcessGroupGlooTest  
2025-12-04T09:42:25.9931803Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2025-12-04T09:42:26.1186261Z   inflating: build/bin/test_jit      
2025-12-04T09:42:26.1264202Z   inflating: build/bin/ProcessGroupNCCLTest  
2025-12-04T09:42:26.1339454Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2025-12-04T09:42:26.1343030Z   inflating: build/bin/example_allreduce  
2025-12-04T09:42:26.1411359Z   inflating: build/bin/test_dist_autograd  
2025-12-04T09:42:26.1494609Z   inflating: build/bin/test_cpp_rpc  
2025-12-04T09:42:26.1497340Z   inflating: build/bin/parallel_benchmark  
2025-12-04T09:42:26.2837817Z   inflating: build/bin/test_api      
2025-12-04T09:42:26.2838241Z    creating: .additional_ci_files/
2025-12-04T09:42:26.2910037Z   inflating: .additional_ci_files/test-times.json  
2025-12-04T09:42:26.3172259Z   inflating: .additional_ci_files/test-class-times.json  
2025-12-04T09:42:26.3204202Z ##[group]Run rm artifacts.zip
2025-12-04T09:42:26.3204574Z [36;1mrm artifacts.zip[0m
2025-12-04T09:42:26.3211708Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:26.3212143Z env:
2025-12-04T09:42:26.3212407Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:26.3212723Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:26.3213248Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:26.3213676Z ##[endgroup]
2025-12-04T09:42:26.3834481Z ##[group]Run df -H
2025-12-04T09:42:26.3834786Z [36;1mdf -H[0m
2025-12-04T09:42:26.3841399Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:26.3841836Z env:
2025-12-04T09:42:26.3842188Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:26.3842508Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:26.3842880Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:26.3843494Z ##[endgroup]
2025-12-04T09:42:26.3893866Z Filesystem        Size  Used Avail Use% Mounted on
2025-12-04T09:42:26.3894313Z devtmpfs          4.2M     0  4.2M   0% /dev
2025-12-04T09:42:26.3894720Z tmpfs              34G     0   34G   0% /dev/shm
2025-12-04T09:42:26.3895128Z tmpfs              14G  562k   14G   1% /run
2025-12-04T09:42:26.3895509Z /dev/nvme0n1p1    161G   51G  111G  32% /
2025-12-04T09:42:26.3896017Z tmpfs              34G   17k   34G   1% /tmp
2025-12-04T09:42:26.3896483Z /dev/nvme0n1p128   11M  1.4M  9.2M  13% /boot/efi
2025-12-04T09:42:26.3896900Z tmpfs             6.7G     0  6.7G   0% /run/user/0
2025-12-04T09:42:26.3935163Z Prepare all required actions
2025-12-04T09:42:26.3936106Z Getting action download info
2025-12-04T09:42:26.5601979Z ##[group]Run ./.github/actions/download-td-artifacts
2025-12-04T09:42:26.5602474Z with:
2025-12-04T09:42:26.5602734Z env:
2025-12-04T09:42:26.5603006Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:26.5603346Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:26.5603704Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:26.5604130Z ##[endgroup]
2025-12-04T09:42:26.5635529Z ##[group]Run seemethere/download-artifact-s3@v4
2025-12-04T09:42:26.5635933Z with:
2025-12-04T09:42:26.5636161Z   name: td_results
2025-12-04T09:42:26.5636443Z   s3-bucket: gha-artifacts
2025-12-04T09:42:26.5636749Z   region: us-east-1
2025-12-04T09:42:26.5636992Z env:
2025-12-04T09:42:26.5637236Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:26.5637543Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:26.5637911Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:26.5638394Z ##[endgroup]
2025-12-04T09:42:27.1198800Z (node:68908) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.
2025-12-04T09:42:27.1199411Z 
2025-12-04T09:42:27.1199633Z Please migrate your code to use AWS SDK for JavaScript (v3).
2025-12-04T09:42:27.1200422Z For more information, check the migration guide at https://a.co/7PzMCcy
2025-12-04T09:42:27.1201279Z (Use `node --trace-warnings ...` to show where the warning was created)
2025-12-04T09:42:27.2303755Z Found 1 objects with prefix pytorch/pytorch/19922826259/td_results/
2025-12-04T09:42:27.2304514Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:42:27.3287529Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/td_results.json
2025-12-04T09:42:27.3293382Z Artifact download has finished successfully
2025-12-04T09:42:27.3491374Z ##[group]Run mkdir -p .additional_ci_files
2025-12-04T09:42:27.3492062Z [36;1mmkdir -p .additional_ci_files[0m
2025-12-04T09:42:27.3492871Z [36;1mmv td_results.json .additional_ci_files/td_results.json || true[0m
2025-12-04T09:42:27.3503261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:27.3504017Z env:
2025-12-04T09:42:27.3504436Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:27.3504981Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:27.3505601Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:27.3506346Z ##[endgroup]
2025-12-04T09:42:27.3633666Z ##[group]Run .github/scripts/parse_ref.py
2025-12-04T09:42:27.3634113Z [36;1m.github/scripts/parse_ref.py[0m
2025-12-04T09:42:27.3640328Z shell: /usr/bin/bash -e {0}
2025-12-04T09:42:27.3640647Z env:
2025-12-04T09:42:27.3640896Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:27.3641196Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:27.3641567Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:27.3641988Z ##[endgroup]
2025-12-04T09:42:27.3969394Z Setting output branch=main
2025-12-04T09:42:27.4117140Z Prepare all required actions
2025-12-04T09:42:27.4117619Z Getting action download info
2025-12-04T09:42:27.5589136Z ##[group]Run ./.github/actions/filter-test-configs
2025-12-04T09:42:27.5589551Z with:
2025-12-04T09:42:27.5590032Z   github-token: ***
2025-12-04T09:42:27.5599408Z   test-matrix: {"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:42:27.5613390Z   job-name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:27.5614791Z env:
2025-12-04T09:42:27.5615211Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:27.5615758Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:27.5616400Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:27.5617160Z ##[endgroup]
2025-12-04T09:42:27.5708743Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T09:42:27.5709370Z with:
2025-12-04T09:42:27.5709727Z   shell: bash
2025-12-04T09:42:27.5710161Z   timeout_minutes: 10
2025-12-04T09:42:27.5710622Z   max_attempts: 5
2025-12-04T09:42:27.5711055Z   retry_wait_seconds: 30
2025-12-04T09:42:27.5712734Z   command: set -eux
# PyYAML 6.0 doesn't work with MacOS x86 anymore
# This must run on Python-3.7 (AmazonLinux2) so can't use request=3.32.2
python3 -m pip install requests==2.27.1 pyyaml==6.0.2

2025-12-04T09:42:27.5714608Z   polling_interval_seconds: 1
2025-12-04T09:42:27.5715198Z   warning_on_retry: true
2025-12-04T09:42:27.5715734Z   continue_on_error: false
2025-12-04T09:42:27.5716236Z env:
2025-12-04T09:42:27.5716646Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:27.5717165Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:27.5717807Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:27.5719084Z   GITHUB_TOKEN: ***
2025-12-04T09:42:27.5719556Z ##[endgroup]
2025-12-04T09:42:27.6804649Z + python3 -m pip install requests==2.27.1 pyyaml==6.0.2
2025-12-04T09:42:27.9572725Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T09:42:28.1276859Z Collecting requests==2.27.1
2025-12-04T09:42:28.1454118Z   Downloading requests-2.27.1-py2.py3-none-any.whl (63 kB)
2025-12-04T09:42:28.3968097Z Collecting pyyaml==6.0.2
2025-12-04T09:42:28.4030096Z   Downloading PyYAML-6.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (737 kB)
2025-12-04T09:42:28.9329743Z Collecting charset-normalizer~=2.0.0
2025-12-04T09:42:28.9369884Z   Downloading charset_normalizer-2.0.12-py3-none-any.whl (39 kB)
2025-12-04T09:42:29.0458062Z Collecting certifi>=2017.4.17
2025-12-04T09:42:29.0497431Z   Downloading certifi-2025.11.12-py3-none-any.whl (159 kB)
2025-12-04T09:42:29.0828107Z Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (2.10)
2025-12-04T09:42:29.0833288Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.9/site-packages (from requests==2.27.1) (1.25.10)
2025-12-04T09:42:29.1798696Z Installing collected packages: charset-normalizer, certifi, requests, pyyaml
2025-12-04T09:42:29.4899587Z Successfully installed certifi-2025.11.12 charset-normalizer-2.0.12 pyyaml-6.0.2 requests-2.27.1
2025-12-04T09:42:29.6587604Z Command completed after 1 attempt(s).
2025-12-04T09:42:29.6664884Z ##[group]Run set -x
2025-12-04T09:42:29.6665179Z [36;1mset -x[0m
2025-12-04T09:42:29.6665442Z [36;1m[0m
2025-12-04T09:42:29.6665908Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:42:29.6666465Z [36;1m# in runner workspace[0m
2025-12-04T09:42:29.6666921Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/parse_ref.py"[0m
2025-12-04T09:42:29.6673684Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:29.6674126Z env:
2025-12-04T09:42:29.6674380Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.6674687Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.6675066Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.6675484Z ##[endgroup]
2025-12-04T09:42:29.6704059Z + python3 /home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/filter-test-configs/../../scripts/parse_ref.py
2025-12-04T09:42:29.6911661Z Setting output branch=main
2025-12-04T09:42:29.6992776Z ##[group]Run echo "Workflow: ${GITHUB_WORKFLOW}"
2025-12-04T09:42:29.6993278Z [36;1mecho "Workflow: ${GITHUB_WORKFLOW}"[0m
2025-12-04T09:42:29.6993700Z [36;1mecho "Job name: ${JOB_NAME}"[0m
2025-12-04T09:42:29.6994073Z [36;1m[0m
2025-12-04T09:42:29.6994518Z [36;1m# Use relative path here as this could be checked out anywhere, not necessarily[0m
2025-12-04T09:42:29.6995084Z [36;1m# in runner workspace[0m
2025-12-04T09:42:29.6995587Z [36;1mpython3 "${GITHUB_ACTION_PATH}/../../scripts/filter_test_configs.py" \[0m
2025-12-04T09:42:29.6996175Z [36;1m  --workflow "${GITHUB_WORKFLOW}" \[0m
2025-12-04T09:42:29.6996606Z [36;1m  --job-name "${JOB_NAME}" \[0m
2025-12-04T09:42:29.7004235Z [36;1m  --test-matrix "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}" \[0m
2025-12-04T09:42:29.7011691Z [36;1m  --selected-test-configs "" \[0m
2025-12-04T09:42:29.7012106Z [36;1m  --pr-number "${PR_NUMBER}" \[0m
2025-12-04T09:42:29.7012588Z [36;1m  --tag "${TAG}" \[0m
2025-12-04T09:42:29.7012935Z [36;1m  --event-name "${EVENT_NAME}" \[0m
2025-12-04T09:42:29.7013319Z [36;1m  --schedule "${SCHEDULE}" \[0m
2025-12-04T09:42:29.7013671Z [36;1m  --branch "${HEAD_BRANCH}"[0m
2025-12-04T09:42:29.7020627Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:29.7021075Z env:
2025-12-04T09:42:29.7021327Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.7021630Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.7022003Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.7022769Z   GITHUB_TOKEN: ***
2025-12-04T09:42:29.7023500Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:29.7024272Z   PR_NUMBER: 
2025-12-04T09:42:29.7024524Z   TAG: 
2025-12-04T09:42:29.7024780Z   EVENT_NAME: schedule
2025-12-04T09:42:29.7025092Z   SCHEDULE: 29 8 * * *
2025-12-04T09:42:29.7025384Z   HEAD_BRANCH: main
2025-12-04T09:42:29.7025651Z ##[endgroup]
2025-12-04T09:42:29.7052760Z Workflow: periodic
2025-12-04T09:42:29.7053509Z Job name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:29.9051728Z Setting output keep-going=True
2025-12-04T09:42:29.9052158Z Setting output ci-verbose-test-logs=False
2025-12-04T09:42:29.9052560Z Setting output ci-test-showlocals=False
2025-12-04T09:42:29.9052960Z Setting output ci-no-test-timeout=False
2025-12-04T09:42:29.9053338Z Setting output ci-no-td=False
2025-12-04T09:42:29.9053714Z Setting output ci-td-distributed=False
2025-12-04T09:42:29.9054094Z Setting output is-unstable=True
2025-12-04T09:42:29.9054445Z Setting output reenabled-issues=
2025-12-04T09:42:29.9070595Z Setting output test-matrix={"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}
2025-12-04T09:42:29.9086739Z Setting output is-test-matrix-empty=False
2025-12-04T09:42:29.9271967Z ##[group]Run echo "Filtered matrix:"
2025-12-04T09:42:29.9272446Z [36;1mecho "Filtered matrix:"[0m
2025-12-04T09:42:29.9288342Z [36;1mecho "{"include": [{"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 1, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 2, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 3, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 4, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "mem_leak_check": "mem_leak_check", "unstable": "unstable", "rerun_disabled_tests": "rerun_disabled_tests"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable", "mem_leak_check": "mem_leak_check"}, {"config": "legacy_nvidia_driver", "shard": 5, "num_shards": 5, "runner": "linux.g4dn.4xlarge.nvidia.gpu", "rerun_disabled_tests": "rerun_disabled_tests", "unstable": "unstable"}]}"[0m
2025-12-04T09:42:29.9304508Z [36;1m[0m
2025-12-04T09:42:29.9304754Z [36;1mecho[0m
2025-12-04T09:42:29.9305072Z [36;1mecho "Is the current job unstable? True"[0m
2025-12-04T09:42:29.9305445Z [36;1m[0m
2025-12-04T09:42:29.9305681Z [36;1mecho[0m
2025-12-04T09:42:29.9305987Z [36;1mecho "Is keep-going label set? True"[0m
2025-12-04T09:42:29.9306347Z [36;1m[0m
2025-12-04T09:42:29.9306581Z [36;1mecho[0m
2025-12-04T09:42:29.9306866Z [36;1mecho "Reenabled issues? "[0m
2025-12-04T09:42:29.9313684Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:29.9314136Z env:
2025-12-04T09:42:29.9314397Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.9314714Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.9315070Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.9315501Z ##[endgroup]
2025-12-04T09:42:29.9342961Z Filtered matrix:
2025-12-04T09:42:29.9362142Z {include: [{config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 1, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 2, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 3, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 4, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check: mem_leak_check, unstable: unstable, rerun_disabled_tests: rerun_disabled_tests}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable, mem_leak_check: mem_leak_check}, {config: legacy_nvidia_driver, shard: 5, num_shards: 5, runner: linux.g4dn.4xlarge.nvidia.gpu, rerun_disabled_tests: rerun_disabled_tests, unstable: unstable}]}
2025-12-04T09:42:29.9377724Z 
2025-12-04T09:42:29.9377847Z Is the current job unstable? True
2025-12-04T09:42:29.9378088Z 
2025-12-04T09:42:29.9378206Z Is keep-going label set? True
2025-12-04T09:42:29.9378428Z 
2025-12-04T09:42:29.9378544Z Reenabled issues? 
2025-12-04T09:42:29.9455048Z ##[group]Run echo "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"
2025-12-04T09:42:29.9455685Z [36;1mecho "timeout=$((JOB_TIMEOUT-30))" >> "${GITHUB_OUTPUT}"[0m
2025-12-04T09:42:29.9462232Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:29.9462663Z env:
2025-12-04T09:42:29.9462914Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.9463234Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.9463593Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.9464016Z   JOB_TIMEOUT: 600
2025-12-04T09:42:29.9464287Z ##[endgroup]
2025-12-04T09:42:29.9545107Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2025-12-04T09:42:29.9545730Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:42:29.9546290Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2025-12-04T09:42:29.9552587Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T09:42:29.9553033Z env:
2025-12-04T09:42:29.9553289Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.9553603Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.9553954Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.9554365Z ##[endgroup]
2025-12-04T09:42:29.9690941Z ##[group]Run set -x
2025-12-04T09:42:29.9691351Z [36;1mset -x[0m
2025-12-04T09:42:29.9691608Z [36;1m[0m
2025-12-04T09:42:29.9691902Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2025-12-04T09:42:29.9692354Z [36;1m  TEST_COMMAND=.ci/pytorch/multigpu-test.sh[0m
2025-12-04T09:42:29.9692823Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2025-12-04T09:42:29.9693251Z [36;1m  TEST_COMMAND=.ci/onnx/test.sh[0m
2025-12-04T09:42:29.9693594Z [36;1melse[0m
2025-12-04T09:42:29.9694033Z [36;1m  TEST_COMMAND=.ci/pytorch/test.sh[0m
2025-12-04T09:42:29.9694408Z [36;1mfi[0m
2025-12-04T09:42:29.9694636Z [36;1m[0m
2025-12-04T09:42:29.9694949Z [36;1m# Leaving 1GB for the runner and other things[0m
2025-12-04T09:42:29.9695646Z [36;1mTOTAL_AVAILABLE_MEMORY_IN_GB=$(awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo)[0m
2025-12-04T09:42:29.9696695Z [36;1m# https://docs.docker.com/engine/containers/resource_constraints/#--memory-swap-details, the 3GB swap[0m
2025-12-04T09:42:29.9697547Z [36;1m# comes from https://github.com/pytorch/test-infra/pull/6058[0m
2025-12-04T09:42:29.9698184Z [36;1mTOTAL_MEMORY_WITH_SWAP=$(("${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}" + 3))[0m
2025-12-04T09:42:29.9698679Z [36;1m[0m
2025-12-04T09:42:29.9698974Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:42:29.9699385Z [36;1m  SHM_OPTS=[0m
2025-12-04T09:42:29.9699677Z [36;1m  JENKINS_USER=[0m
2025-12-04T09:42:29.9700081Z [36;1m  # ensure that docker container cleanly exits in 12 hours[0m
2025-12-04T09:42:29.9700658Z [36;1m  # if for some reason cleanup action doesn't stop container[0m
2025-12-04T09:42:29.9701337Z [36;1m  # when job is cancelled[0m
2025-12-04T09:42:29.9701707Z [36;1m  DOCKER_SHELL_CMD="sleep 12h"[0m
2025-12-04T09:42:29.9702090Z [36;1m  USED_IMAGE="${DOCKER_IMAGE_S390X}"[0m
2025-12-04T09:42:29.9702459Z [36;1melse[0m
2025-12-04T09:42:29.9702757Z [36;1m  SHM_OPTS="--shm-size=${SHM_SIZE}"[0m
2025-12-04T09:42:29.9703146Z [36;1m  JENKINS_USER="--user jenkins"[0m
2025-12-04T09:42:29.9703514Z [36;1m  DOCKER_SHELL_CMD=[0m
2025-12-04T09:42:29.9703976Z [36;1m  USED_IMAGE="${DOCKER_IMAGE}"[0m
2025-12-04T09:42:29.9704312Z [36;1mfi[0m
2025-12-04T09:42:29.9704615Z [36;1m[0m
2025-12-04T09:42:29.9705014Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2025-12-04T09:42:29.9705790Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2025-12-04T09:42:29.9706801Z [36;1m# Used for GPU_FLAG, SHM_OPTS, JENKINS_USER and DOCKER_SHELL_CMD since that doesn't play nice[0m
2025-12-04T09:42:29.9707465Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2025-12-04T09:42:29.9707861Z [36;1mcontainer_name=$(docker run \[0m
2025-12-04T09:42:29.9708218Z [36;1m  ${GPU_FLAG:-} \[0m
2025-12-04T09:42:29.9708573Z [36;1m  ${SCCACHE_SERVER_PORT_DOCKER_FLAG:-} \[0m
2025-12-04T09:42:29.9708984Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2025-12-04T09:42:29.9709333Z [36;1m  -e PR_NUMBER \[0m
2025-12-04T09:42:29.9709647Z [36;1m  -e GITHUB_ACTIONS \[0m
2025-12-04T09:42:29.9709988Z [36;1m  -e GITHUB_REPOSITORY \[0m
2025-12-04T09:42:29.9710341Z [36;1m  -e GITHUB_WORKFLOW \[0m
2025-12-04T09:42:29.9710660Z [36;1m  -e GITHUB_JOB \[0m
2025-12-04T09:42:29.9710971Z [36;1m  -e GITHUB_RUN_ID \[0m
2025-12-04T09:42:29.9711298Z [36;1m  -e GITHUB_RUN_NUMBER \[0m
2025-12-04T09:42:29.9711636Z [36;1m  -e GITHUB_RUN_ATTEMPT \[0m
2025-12-04T09:42:29.9711976Z [36;1m  -e JOB_ID \[0m
2025-12-04T09:42:29.9712271Z [36;1m  -e JOB_NAME \[0m
2025-12-04T09:42:29.9712574Z [36;1m  -e BASE_SHA \[0m
2025-12-04T09:42:29.9712861Z [36;1m  -e BRANCH \[0m
2025-12-04T09:42:29.9713149Z [36;1m  -e SHA1 \[0m
2025-12-04T09:42:29.9713440Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2025-12-04T09:42:29.9713774Z [36;1m  -e IN_WHEEL_TEST \[0m
2025-12-04T09:42:29.9714099Z [36;1m  -e SHARD_NUMBER \[0m
2025-12-04T09:42:29.9714420Z [36;1m  -e TEST_CONFIG \[0m
2025-12-04T09:42:29.9714729Z [36;1m  -e NUM_TEST_SHARDS \[0m
2025-12-04T09:42:29.9715277Z [36;1m  -e REENABLED_ISSUES \[0m
2025-12-04T09:42:29.9715644Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2025-12-04T09:42:29.9716002Z [36;1m  -e VERBOSE_TEST_LOGS \[0m
2025-12-04T09:42:29.9716354Z [36;1m  -e TEST_SHOWLOCALS \[0m
2025-12-04T09:42:29.9716692Z [36;1m  -e NO_TEST_TIMEOUT \[0m
2025-12-04T09:42:29.9717024Z [36;1m  -e NO_TD \[0m
2025-12-04T09:42:29.9717311Z [36;1m  -e TD_DISTRIBUTED \[0m
2025-12-04T09:42:29.9717645Z [36;1m  -e PR_LABELS \[0m
2025-12-04T09:42:29.9717996Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2025-12-04T09:42:29.9718494Z [36;1m  -e SCCACHE_BUCKET \[0m
2025-12-04T09:42:29.9718830Z [36;1m  -e SCCACHE_REGION \[0m
2025-12-04T09:42:29.9719156Z [36;1m  -e XLA_CUDA \[0m
2025-12-04T09:42:29.9719482Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2025-12-04T09:42:29.9719910Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2025-12-04T09:42:29.9720345Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2025-12-04T09:42:29.9720782Z [36;1m  -e SKIP_SCCACHE_INITIALIZATION=1 \[0m
2025-12-04T09:42:29.9721174Z [36;1m  -e HUGGING_FACE_HUB_TOKEN \[0m
2025-12-04T09:42:29.9721561Z [36;1m  -e VLLM_TEST_HUGGING_FACE_TOKEN \[0m
2025-12-04T09:42:29.9721967Z [36;1m  -e SCRIBE_GRAPHQL_ACCESS_TOKEN \[0m
2025-12-04T09:42:29.9722425Z [36;1m  -e DASHBOARD_TAG \[0m
2025-12-04T09:42:29.9722761Z [36;1m  -e ARTIFACTS_FILE_SUFFIX \[0m
2025-12-04T09:42:29.9723194Z [36;1m  --memory="${TOTAL_AVAILABLE_MEMORY_IN_GB%.*}g" \[0m
2025-12-04T09:42:29.9723673Z [36;1m  --memory-swap="${TOTAL_MEMORY_WITH_SWAP}g" \[0m
2025-12-04T09:42:29.9724166Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2025-12-04T09:42:29.9724631Z [36;1m  --security-opt seccomp=unconfined \[0m
2025-12-04T09:42:29.9725030Z [36;1m  --cap-add=SYS_PTRACE \[0m
2025-12-04T09:42:29.9725363Z [36;1m  --ipc=host \[0m
2025-12-04T09:42:29.9725663Z [36;1m  ${SHM_OPTS} \[0m
2025-12-04T09:42:29.9725960Z [36;1m  --tty \[0m
2025-12-04T09:42:29.9726224Z [36;1m  --detach \[0m
2025-12-04T09:42:29.9726535Z [36;1m  --name="${container_name}" \[0m
2025-12-04T09:42:29.9726902Z [36;1m  ${JENKINS_USER} \[0m
2025-12-04T09:42:29.9727299Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2025-12-04T09:42:29.9727768Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2025-12-04T09:42:29.9728144Z [36;1m  "${USED_IMAGE}" \[0m
2025-12-04T09:42:29.9728467Z [36;1m  ${DOCKER_SHELL_CMD}[0m
2025-12-04T09:42:29.9728768Z [36;1m)[0m
2025-12-04T09:42:29.9729159Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2025-12-04T09:42:29.9729647Z [36;1m[0m
2025-12-04T09:42:29.9729944Z [36;1mif [[ ${BUILD_ENVIRONMENT} == *"s390x"* ]]; then[0m
2025-12-04T09:42:29.9730628Z [36;1m  docker exec -t "${container_name}" sh -c "python3 -m pip install -r .ci/docker/requirements-ci.txt"[0m
2025-12-04T09:42:29.9731243Z [36;1mfi[0m
2025-12-04T09:42:29.9731484Z [36;1m[0m
2025-12-04T09:42:29.9732053Z [36;1mdocker exec -t "${container_name}" sh -c "python3 -m pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2025-12-04T09:42:29.9738725Z shell: /usr/bin/bash -e {0}
2025-12-04T09:42:29.9739044Z env:
2025-12-04T09:42:29.9739282Z   GIT_DEFAULT_BRANCH: main
2025-12-04T09:42:29.9739594Z   HAS_NVIDIA_GPU: true
2025-12-04T09:42:29.9739963Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:29.9740460Z   BUILD_ENVIRONMENT: linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:29.9740884Z   PR_NUMBER: 
2025-12-04T09:42:29.9741167Z   GITHUB_REPOSITORY: pytorch/pytorch
2025-12-04T09:42:29.9741544Z   GITHUB_WORKFLOW: periodic
2025-12-04T09:42:29.9741839Z   GITHUB_JOB: test
2025-12-04T09:42:29.9742118Z   GITHUB_RUN_ID: 19922826259
2025-12-04T09:42:29.9742433Z   GITHUB_RUN_NUMBER: 19107
2025-12-04T09:42:29.9742724Z   GITHUB_RUN_ATTEMPT: 1
2025-12-04T09:42:29.9743017Z   JOB_ID: 57119749427
2025-12-04T09:42:29.9743739Z   JOB_NAME: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:29.9744621Z   BRANCH: main
2025-12-04T09:42:29.9744937Z   SHA1: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:29.9745398Z   BASE_SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:29.9745821Z   TEST_CONFIG: legacy_nvidia_driver
2025-12-04T09:42:29.9746164Z   SHARD_NUMBER: 4
2025-12-04T09:42:29.9746433Z   NUM_TEST_SHARDS: 5
2025-12-04T09:42:29.9746710Z   EXTRA_FLAGS: 
2025-12-04T09:42:29.9746959Z   OP_BENCHMARK_TESTS: 
2025-12-04T09:42:29.9747245Z   REENABLED_ISSUES: 
2025-12-04T09:42:29.9747607Z   CONTINUE_THROUGH_ERROR: True
2025-12-04T09:42:29.9747923Z   VERBOSE_TEST_LOGS: False
2025-12-04T09:42:29.9748233Z   TEST_SHOWLOCALS: False
2025-12-04T09:42:29.9748539Z   NO_TEST_TIMEOUT: False
2025-12-04T09:42:29.9748813Z   NO_TD: False
2025-12-04T09:42:29.9749077Z   TD_DISTRIBUTED: False
2025-12-04T09:42:29.9749442Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2025-12-04T09:42:29.9749854Z   SCCACHE_REGION: us-east-1
2025-12-04T09:42:29.9750159Z   SHM_SIZE: 2g
2025-12-04T09:42:29.9751084Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:29.9752778Z   DOCKER_IMAGE_S390X: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:29.9753792Z   XLA_CUDA: 
2025-12-04T09:42:29.9754211Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:42:29.9754753Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 1
2025-12-04T09:42:29.9755131Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2025-12-04T09:42:29.9755473Z   DASHBOARD_TAG: 
2025-12-04T09:42:29.9755980Z   VLLM_TEST_HUGGING_FACE_TOKEN: ***
2025-12-04T09:42:29.9756464Z   HUGGING_FACE_HUB_TOKEN: ***
2025-12-04T09:42:29.9756951Z   SCRIBE_GRAPHQL_ACCESS_TOKEN: ***
2025-12-04T09:42:29.9757556Z   ARTIFACTS_FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T09:42:29.9758210Z ##[endgroup]
2025-12-04T09:42:29.9784842Z + [[ legacy_nvidia_driver == \m\u\l\t\i\g\p\u ]]
2025-12-04T09:42:29.9785309Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *onnx* ]]
2025-12-04T09:42:29.9785725Z + TEST_COMMAND=.ci/pytorch/test.sh
2025-12-04T09:42:29.9788569Z ++ awk '/MemTotal/ { printf "%.3f \n", $2/1024/1024 - 1 }' /proc/meminfo
2025-12-04T09:42:29.9810598Z + TOTAL_AVAILABLE_MEMORY_IN_GB='61.094 '
2025-12-04T09:42:29.9810988Z + TOTAL_MEMORY_WITH_SWAP=64
2025-12-04T09:42:29.9811393Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:42:29.9811847Z + SHM_OPTS=--shm-size=2g
2025-12-04T09:42:29.9812159Z + JENKINS_USER='--user jenkins'
2025-12-04T09:42:29.9812469Z + DOCKER_SHELL_CMD=
2025-12-04T09:42:29.9813400Z + USED_IMAGE=308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:29.9819835Z +++ nproc --ignore=2
2025-12-04T09:42:30.0014949Z ++ docker run --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e GITHUB_REPOSITORY -e GITHUB_WORKFLOW -e GITHUB_JOB -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_RUN_ATTEMPT -e JOB_ID -e JOB_NAME -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e REENABLED_ISSUES -e CONTINUE_THROUGH_ERROR -e VERBOSE_TEST_LOGS -e TEST_SHOWLOCALS -e NO_TEST_TIMEOUT -e NO_TD -e TD_DISTRIBUTED -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_REGION -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS -e SKIP_SCCACHE_INITIALIZATION=1 -e HUGGING_FACE_HUB_TOKEN -e VLLM_TEST_HUGGING_FACE_TOKEN -e SCRIBE_GRAPHQL_ACCESS_TOKEN -e DASHBOARD_TAG -e ARTIFACTS_FILE_SUFFIX --memory=61g --memory-swap=64g --env-file=/tmp/github_env_19922826259 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T09:42:37.6856201Z + container_name=428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T09:42:37.6857078Z + echo DOCKER_CONTAINER_ID=428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T09:42:37.6857906Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *\s\3\9\0\x* ]]
2025-12-04T09:42:37.6862585Z ++ echo dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl
2025-12-04T09:42:37.6865006Z + docker exec -t 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4 sh -c 'python3 -m pip install dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl[opt-einsum] && .ci/pytorch/test.sh'
2025-12-04T09:42:38.1856489Z Processing ./dist/torch-2.10.0a0+gitffd9b0f-cp310-cp310-linux_x86_64.whl (from torch==2.10.0a0+gitffd9b0f)
2025-12-04T09:42:39.0644388Z Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.18.0)
2025-12-04T09:42:39.0649217Z Requirement already satisfied: typing-extensions>=4.10.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (4.12.2)
2025-12-04T09:42:39.0667263Z Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.13.3)
2025-12-04T09:42:39.0669328Z Requirement already satisfied: networkx>=2.5.1 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2.8.8)
2025-12-04T09:42:39.0670898Z Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.1.6)
2025-12-04T09:42:39.0672485Z Requirement already satisfied: fsspec>=0.8.5 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (2025.10.0)
2025-12-04T09:42:39.0684787Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.3.0)
2025-12-04T09:42:39.1111351Z Requirement already satisfied: numpy>=1.7 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.22.4)
2025-12-04T09:42:39.1134394Z Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (1.3.0)
2025-12-04T09:42:39.1203113Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch==2.10.0a0+gitffd9b0f->torch==2.10.0a0+gitffd9b0f) (3.0.3)
2025-12-04T09:42:39.5471954Z Installing collected packages: torch
2025-12-04T09:42:51.9109250Z Successfully installed torch-2.10.0a0+gitffd9b0f
2025-12-04T09:42:51.9881701Z + export TERM=vt100
2025-12-04T09:42:51.9882054Z + TERM=vt100
2025-12-04T09:42:51.9883534Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:42:51.9892290Z + source .ci/pytorch/common.sh
2025-12-04T09:42:51.9895763Z +++ dirname .ci/pytorch/common.sh
2025-12-04T09:42:51.9903626Z ++ source .ci/pytorch/common_utils.sh
2025-12-04T09:42:51.9905087Z +++ declare -f -t trap_add
2025-12-04T09:42:51.9911111Z ++ set -ex -o pipefail
2025-12-04T09:42:51.9911467Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:42:51.9911893Z ++ BUILD_TEST_LIBTORCH=0
2025-12-04T09:42:51.9915307Z ++ dirname .ci/pytorch/test.sh
2025-12-04T09:42:51.9923144Z + source .ci/pytorch/common-build.sh
2025-12-04T09:42:51.9924964Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 != *win-* ]]
2025-12-04T09:42:51.9931408Z ++++ dirname .ci/pytorch/common-build.sh
2025-12-04T09:42:51.9940303Z +++ cd .ci/pytorch
2025-12-04T09:42:51.9940655Z +++ pwd -P
2025-12-04T09:42:51.9949822Z ++ script_dir=/var/lib/jenkins/workspace/.ci/pytorch
2025-12-04T09:42:51.9950638Z ++ [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-pch* ]]
2025-12-04T09:42:51.9951409Z ++ which sccache
2025-12-04T09:42:51.9970705Z ++ [[ -z ossci-compiler-cache-circleci-v2 ]]
2025-12-04T09:42:51.9971257Z ++ sccache --stop-server
2025-12-04T09:42:51.9998384Z ++ true
2025-12-04T09:42:51.9998886Z ++ rm -f /var/lib/jenkins/sccache_error.log
2025-12-04T09:42:52.0009187Z ++ trap_add sccache_epilogue EXIT
2025-12-04T09:42:52.0009830Z ++ trap_add_cmd=sccache_epilogue
2025-12-04T09:42:52.0010362Z ++ shift
2025-12-04T09:42:52.0010629Z ++ for trap_add_name in "$@"
2025-12-04T09:42:52.0016343Z ++++ trap -p EXIT
2025-12-04T09:42:52.0018512Z +++ eval 'extract_trap_cmd '
2025-12-04T09:42:52.0018906Z ++++ extract_trap_cmd
2025-12-04T09:42:52.0019193Z ++++ printf '%s\n' ''
2025-12-04T09:42:52.0019558Z +++ printf '%s\n' sccache_epilogue
2025-12-04T09:42:52.0021375Z ++ trap -- '
2025-12-04T09:42:52.0021826Z sccache_epilogue' EXIT
2025-12-04T09:42:52.0022343Z ++ [[ -n 1 ]]
2025-12-04T09:42:52.0023078Z ++ echo 'Skipping sccache server initialization, setting environment variables'
2025-12-04T09:42:52.0024284Z Skipping sccache server initialization, setting environment variables
2025-12-04T09:42:52.0024986Z ++ export SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:52.0025322Z ++ SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:52.0025747Z ++ export SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:52.0026295Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:52.0033514Z ++ export RUST_LOG=sccache::server=error
2025-12-04T09:42:52.0033930Z ++ RUST_LOG=sccache::server=error
2025-12-04T09:42:52.0034281Z ++ sccache --zero-stats
2025-12-04T09:42:52.1156590Z Statistics zeroed.
2025-12-04T09:42:52.1161740Z ++ which ccache
2025-12-04T09:42:52.1188567Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *rocm* ]]
2025-12-04T09:42:52.1189107Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *s390x* ]]
2025-12-04T09:42:52.1189552Z + [[ -d /var/lib/jenkins/workspace ]]
2025-12-04T09:42:52.1192684Z ++ stat -c %u /var/lib/jenkins/workspace
2025-12-04T09:42:52.1210711Z + WORKSPACE_ORIGINAL_OWNER_ID=1000
2025-12-04T09:42:52.1211094Z + trap_add cleanup_workspace EXIT
2025-12-04T09:42:52.1211459Z + trap_add_cmd=cleanup_workspace
2025-12-04T09:42:52.1211768Z + shift
2025-12-04T09:42:52.1212064Z + for trap_add_name in "$@"
2025-12-04T09:42:52.1218624Z +++ trap -p EXIT
2025-12-04T09:42:52.1222090Z ++ eval 'extract_trap_cmd trap -- '\''
2025-12-04T09:42:52.1222519Z sccache_epilogue'\'' EXIT'
2025-12-04T09:42:52.1222858Z +++ extract_trap_cmd trap -- '
2025-12-04T09:42:52.1223186Z sccache_epilogue' EXIT
2025-12-04T09:42:52.1223462Z +++ printf '%s\n' '
2025-12-04T09:42:52.1223732Z sccache_epilogue'
2025-12-04T09:42:52.1224024Z ++ printf '%s\n' cleanup_workspace
2025-12-04T09:42:52.1224961Z + trap -- '
2025-12-04T09:42:52.1225217Z sccache_epilogue
2025-12-04T09:42:52.1225513Z cleanup_workspace' EXIT
2025-12-04T09:42:52.1225855Z + sudo chown -R jenkins /var/lib/jenkins/workspace
2025-12-04T09:42:52.8547125Z + git config --global --add safe.directory /var/lib/jenkins/workspace
2025-12-04T09:42:52.8566235Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:42:52.8569512Z ++ python -c 'import os;import numba.cuda; print(os.path.dirname(numba.cuda.__file__))'
2025-12-04T09:42:53.3417933Z + NUMBA_CUDA_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:53.3418719Z + '[' -n /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ']'
2025-12-04T09:42:53.3424603Z +++ realpath .ci/pytorch/test.sh
2025-12-04T09:42:53.3434706Z ++ dirname /var/lib/jenkins/workspace/.ci/pytorch/test.sh
2025-12-04T09:42:53.3458132Z + NUMBA_PATCH=/var/lib/jenkins/workspace/.ci/pytorch/numba-cuda-13.patch
2025-12-04T09:42:53.3458824Z + pushd /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:53.3459847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda ~/workspace
2025-12-04T09:42:53.3460399Z + patch -p4
2025-12-04T09:42:53.3473845Z patching file cudadrv/driver.py
2025-12-04T09:42:53.3474220Z Hunk #1 succeeded at 357 (offset -8 lines).
2025-12-04T09:42:53.3486402Z + popd
2025-12-04T09:42:53.3486656Z ~/workspace
2025-12-04T09:42:53.3486939Z + echo 'Environment variables:'
2025-12-04T09:42:53.3487259Z Environment variables:
2025-12-04T09:42:53.3487539Z + env
2025-12-04T09:42:53.3496959Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:42:53.3498011Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:42:53.3498533Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:53.3499271Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:42:53.3499612Z HOSTNAME=428ca50ff249
2025-12-04T09:42:53.3500283Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3501231Z GITHUB_ACTION=__run_3
2025-12-04T09:42:53.3501559Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:42:53.3501896Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:42:53.3502212Z TEST_CONFIG=legacy_nvidia_driver
2025-12-04T09:42:53.3502565Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:42:53.3502948Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:42:53.3503307Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:53.3503777Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:42:53.3504167Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:42:53.3504515Z GITHUB_REF_TYPE=branch
2025-12-04T09:42:53.3504871Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3505269Z XLA_CUDA=
2025-12-04T09:42:53.3505514Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:42:53.3505991Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:42:53.3506499Z ***
2025-12-04T09:42:53.3506744Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:42:53.3507060Z GITHUB_ACTIONS=true
2025-12-04T09:42:53.3507350Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:53.3507755Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:53.3508203Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3508656Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3509289Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:42:53.3509853Z UCC_HOME=/usr
2025-12-04T09:42:53.3510101Z VERBOSE_TEST_LOGS=False
2025-12-04T09:42:53.3510400Z GITHUB_REF=refs/heads/main
2025-12-04T09:42:53.3510706Z SHARD_NUMBER=4
2025-12-04T09:42:53.3510968Z GITHUB_REF_PROTECTED=true
2025-12-04T09:42:53.3511281Z HOME=/var/lib/jenkins
2025-12-04T09:42:53.3511601Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:42:53.3511970Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:42:53.3512368Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:42:53.3512767Z USE_SYSTEM_NCCL=1
2025-12-04T09:42:53.3513018Z NUM_TEST_SHARDS=5
2025-12-04T09:42:53.3513277Z UCX_HOME=/usr
2025-12-04T09:42:53.3513946Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3515144Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:53.3516290Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3517252Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:42:53.3517858Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:42:53.3518153Z DASHBOARD_TAG=
2025-12-04T09:42:53.3518419Z GITHUB_RUN_ID=19922826259
2025-12-04T09:42:53.3518721Z INSTALLED_OPENBLAS=
2025-12-04T09:42:53.3519427Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3520223Z GITHUB_ACTOR=huydhn
2025-12-04T09:42:53.3520486Z PR_NUMBER=
2025-12-04T09:42:53.3520726Z DESIRED_CUDA=12.4
2025-12-04T09:42:53.3521172Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:42:53.3521478Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:42:53.3521870Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:42:53.3522358Z TERM=vt100
2025-12-04T09:42:53.3522606Z INSTALLED_VISION=yes
2025-12-04T09:42:53.3522880Z BRANCH=main
2025-12-04T09:42:53.3523123Z SCCACHE_REGION=us-east-1
2025-12-04T09:42:53.3523439Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:42:53.3523767Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:42:53.3524057Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:42:53.3524782Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:42:53.3525469Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:42:53.3525867Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:42:53.3526297Z REENABLED_ISSUES=
2025-12-04T09:42:53.3526553Z DOCS=
2025-12-04T09:42:53.3526780Z SHLVL=1
2025-12-04T09:42:53.3526991Z MAX_JOBS=14
2025-12-04T09:42:53.3527243Z GITHUB_ACTOR_ID=475357
2025-12-04T09:42:53.3527648Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3528091Z GITHUB_REF_NAME=main
2025-12-04T09:42:53.3528537Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:42:53.3529035Z GITHUB_JOB=test
2025-12-04T09:42:53.3529283Z NO_TEST_TIMEOUT=False
2025-12-04T09:42:53.3529574Z TD_DISTRIBUTED=False
2025-12-04T09:42:53.3529872Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:42:53.3530228Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:42:53.3530526Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:42:53.3530831Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:42:53.3531761Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:42:53.3532711Z GITHUB_BASE_REF=
2025-12-04T09:42:53.3532974Z INSTALLED_ACL=
2025-12-04T09:42:53.3533512Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T09:42:53.3534124Z CI=true
2025-12-04T09:42:53.3534378Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:42:53.3534756Z RUST_LOG=sccache::server=error
2025-12-04T09:42:53.3535061Z JOB_ID=57119749427
2025-12-04T09:42:53.3535324Z GITHUB_HEAD_REF=
2025-12-04T09:42:53.3535585Z GITHUB_ACTION_REF=
2025-12-04T09:42:53.3535929Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:42:53.3536325Z TEST_SHOWLOCALS=False
2025-12-04T09:42:53.3536620Z GITHUB_WORKFLOW=periodic
2025-12-04T09:42:53.3536938Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:42:53.3537675Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3538431Z NO_TD=False
2025-12-04T09:42:53.3538694Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:42:53.3539034Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:42:53.3539400Z _=/usr/bin/env
2025-12-04T09:42:53.3539815Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:53.3540433Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2025-12-04T09:42:53.3649123Z + TORCH_INSTALL_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch
2025-12-04T09:42:53.3649937Z + TORCH_BIN_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/bin
2025-12-04T09:42:53.3650649Z + TORCH_LIB_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/lib
2025-12-04T09:42:53.3651359Z + TORCH_TEST_DIR=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/test
2025-12-04T09:42:53.3651982Z + BUILD_DIR=build
2025-12-04T09:42:53.3652259Z + BUILD_RENAMED_DIR=build_renamed
2025-12-04T09:42:53.3652610Z + BUILD_BIN_DIR=build/bin
2025-12-04T09:42:53.3652908Z + SHARD_NUMBER=4
2025-12-04T09:42:53.3653160Z + NUM_TEST_SHARDS=5
2025-12-04T09:42:53.3653468Z + export TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:42:53.3653831Z + TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:42:53.3654141Z + export VALGRIND=ON
2025-12-04T09:42:53.3654421Z + VALGRIND=ON
2025-12-04T09:42:53.3654919Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *clang9* ]]
2025-12-04T09:42:53.3655391Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:42:53.3655775Z + detect_cuda_arch
2025-12-04T09:42:53.3656097Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:42:53.3656506Z + command -v nvidia-smi
2025-12-04T09:42:53.3656787Z /usr/bin/nvidia-smi
2025-12-04T09:42:53.3659227Z ++ nvidia-smi --query-gpu=compute_cap --format=csv
2025-12-04T09:42:53.3659927Z ++ tail -n 1
2025-12-04T09:42:53.3890022Z + TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:42:53.3890615Z + export TORCH_CUDA_ARCH_LIST
2025-12-04T09:42:53.3891005Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *s390x* ]]
2025-12-04T09:42:53.3891413Z + [[ 0 == \1 ]]
2025-12-04T09:42:53.3891649Z + [[ True == \1 ]]
2025-12-04T09:42:53.3891981Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *bazel* ]]
2025-12-04T09:42:53.3894855Z ++ realpath build/custom_test_artifacts
2025-12-04T09:42:53.3925589Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2025-12-04T09:42:53.3926184Z + [[ -n '' ]]
2025-12-04T09:42:53.3926462Z + echo 'Environment variables'
2025-12-04T09:42:53.3926786Z Environment variables
2025-12-04T09:42:53.3927069Z + env
2025-12-04T09:42:53.3951162Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T09:42:53.3951973Z CONTINUE_THROUGH_ERROR=True
2025-12-04T09:42:53.3952381Z BUILD_ENVIRONMENT=linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T09:42:53.3953625Z VLLM_TEST_HUGGING_FACE_TOKEN=***
2025-12-04T09:42:53.3954136Z HOSTNAME=428ca50ff249
2025-12-04T09:42:53.3954847Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3956075Z GITHUB_ACTION=__run_3
2025-12-04T09:42:53.3956386Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1
2025-12-04T09:42:53.3956741Z GITHUB_RUN_NUMBER=19107
2025-12-04T09:42:53.3957038Z TEST_CONFIG=legacy_nvidia_driver
2025-12-04T09:42:53.3957394Z GITHUB_REPOSITORY_OWNER_ID=21003710
2025-12-04T09:42:53.3957785Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2025-12-04T09:42:53.3958142Z SCCACHE_IDLE_TIMEOUT=0
2025-12-04T09:42:53.3958603Z SCRIBE_GRAPHQL_ACCESS_TOKEN=***
2025-12-04T09:42:53.3958946Z GITHUB_TRIGGERING_ACTOR=huydhn
2025-12-04T09:42:53.3959275Z GITHUB_REF_TYPE=branch
2025-12-04T09:42:53.3959780Z TORCH_CUDA_ARCH_LIST=7.5
2025-12-04T09:42:53.3960415Z BASE_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3961140Z XLA_CUDA=
2025-12-04T09:42:53.3961599Z NCCL_LIB_DIR=/usr/local/cuda/lib64/
2025-12-04T09:42:53.3962680Z HUGGING_FACE_HUB_TOKEN=***
2025-12-04T09:42:53.3963330Z ***
2025-12-04T09:42:53.3963562Z GITHUB_REPOSITORY_ID=65600975
2025-12-04T09:42:53.3963892Z GITHUB_ACTIONS=true
2025-12-04T09:42:53.3964185Z NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T09:42:53.3964629Z SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log
2025-12-04T09:42:53.3965094Z SHA1=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3965531Z GITHUB_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3966168Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/periodic.yml@refs/heads/main
2025-12-04T09:42:53.3966739Z UCC_HOME=/usr
2025-12-04T09:42:53.3966994Z TORCH_SERIALIZATION_DEBUG=1
2025-12-04T09:42:53.3967313Z VERBOSE_TEST_LOGS=False
2025-12-04T09:42:53.3967614Z GITHUB_REF=refs/heads/main
2025-12-04T09:42:53.3967898Z SHARD_NUMBER=4
2025-12-04T09:42:53.3968164Z GITHUB_REF_PROTECTED=true
2025-12-04T09:42:53.3968465Z HOME=/var/lib/jenkins
2025-12-04T09:42:53.3968774Z GITHUB_API_URL=https://api.github.com
2025-12-04T09:42:53.3969169Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2025-12-04T09:42:53.3969575Z UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
2025-12-04T09:42:53.3969960Z USE_SYSTEM_NCCL=1
2025-12-04T09:42:53.3970226Z NUM_TEST_SHARDS=5
2025-12-04T09:42:53.3970485Z UCX_HOME=/usr
2025-12-04T09:42:53.3971154Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3972579Z JOB_NAME=linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T09:42:53.3973761Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3974738Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2025-12-04T09:42:53.3975354Z GITHUB_EVENT_NAME=schedule
2025-12-04T09:42:53.3975652Z DASHBOARD_TAG=
2025-12-04T09:42:53.3975930Z GITHUB_RUN_ID=19922826259
2025-12-04T09:42:53.3976352Z INSTALLED_OPENBLAS=
2025-12-04T09:42:53.3977084Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3977906Z GITHUB_ACTOR=huydhn
2025-12-04T09:42:53.3978182Z PR_NUMBER=
2025-12-04T09:42:53.3978417Z DESIRED_CUDA=12.4
2025-12-04T09:42:53.3978689Z GITHUB_RUN_ATTEMPT=1
2025-12-04T09:42:53.3978968Z VALGRIND=ON
2025-12-04T09:42:53.3979221Z ANACONDA_PYTHON_VERSION=3.10
2025-12-04T09:42:53.3979623Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2025-12-04T09:42:53.3980042Z TERM=vt100
2025-12-04T09:42:53.3980288Z INSTALLED_VISION=yes
2025-12-04T09:42:53.3980574Z BRANCH=main
2025-12-04T09:42:53.3980827Z SCCACHE_REGION=us-east-1
2025-12-04T09:42:53.3981132Z OPENSSL_ROOT_DIR=/opt/openssl
2025-12-04T09:42:53.3981467Z BUILD_AOT_INDUCTOR_TEST=
2025-12-04T09:42:53.3981780Z CUDA_PATH=/usr/local/cuda
2025-12-04T09:42:53.3982395Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2025-12-04T09:42:53.3983086Z GITHUB_SERVER_URL=https://github.com
2025-12-04T09:42:53.3983518Z UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
2025-12-04T09:42:53.3983922Z REENABLED_ISSUES=
2025-12-04T09:42:53.3984171Z DOCS=
2025-12-04T09:42:53.3984395Z SHLVL=1
2025-12-04T09:42:53.3984625Z MAX_JOBS=14
2025-12-04T09:42:53.3984864Z GITHUB_ACTOR_ID=475357
2025-12-04T09:42:53.3985265Z GITHUB_WORKFLOW_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T09:42:53.3985730Z GITHUB_REF_NAME=main
2025-12-04T09:42:53.3986161Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2025-12-04T09:42:53.3986670Z GITHUB_JOB=test
2025-12-04T09:42:53.3986933Z NO_TEST_TIMEOUT=False
2025-12-04T09:42:53.3987206Z TD_DISTRIBUTED=False
2025-12-04T09:42:53.3987507Z GITHUB_REPOSITORY=pytorch/pytorch
2025-12-04T09:42:53.3987855Z GITHUB_RETENTION_DAYS=90
2025-12-04T09:42:53.3988144Z OPENSSL_DIR=/opt/openssl
2025-12-04T09:42:53.3988458Z GITHUB_ACTION_REPOSITORY=
2025-12-04T09:42:53.3989391Z PATH=/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:42:53.3990372Z GITHUB_BASE_REF=
2025-12-04T09:42:53.3990630Z INSTALLED_ACL=
2025-12-04T09:42:53.3991173Z ARTIFACTS_FILE_SUFFIX=test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T09:42:53.3991794Z CI=true
2025-12-04T09:42:53.3992040Z GITHUB_REPOSITORY_OWNER=pytorch
2025-12-04T09:42:53.3992417Z RUST_LOG=sccache::server=error
2025-12-04T09:42:53.3992733Z JOB_ID=57119749427
2025-12-04T09:42:53.3992986Z GITHUB_HEAD_REF=
2025-12-04T09:42:53.3993249Z GITHUB_ACTION_REF=
2025-12-04T09:42:53.3993587Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2025-12-04T09:42:53.3993986Z TEST_SHOWLOCALS=False
2025-12-04T09:42:53.3994281Z GITHUB_WORKFLOW=periodic
2025-12-04T09:42:53.3994592Z DEBIAN_FRONTEND=noninteractive
2025-12-04T09:42:53.3995329Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_12f782ac-3486-4605-947a-3e1e053e632a
2025-12-04T09:42:53.3996077Z NO_TD=False
2025-12-04T09:42:53.3996346Z SKIP_SCCACHE_INITIALIZATION=1
2025-12-04T09:42:53.3996700Z NCCL_INCLUDE_DIR=/usr/local/cuda/include/
2025-12-04T09:42:53.3997221Z OLDPWD=/opt/conda/envs/py_3.10/lib/python3.10/site-packages/numba/cuda
2025-12-04T09:42:53.3997729Z _=/usr/bin/env
2025-12-04T09:42:53.3997995Z + echo 'Testing pytorch'
2025-12-04T09:42:53.3998281Z Testing pytorch
2025-12-04T09:42:53.3998659Z + export LANG=C.UTF-8
2025-12-04T09:42:53.3998950Z + LANG=C.UTF-8
2025-12-04T09:42:53.3999191Z + PR_NUMBER=
2025-12-04T09:42:53.3999482Z + [[ legacy_nvidia_driver == \d\e\f\a\u\l\t ]]
2025-12-04T09:42:53.3999918Z + [[ legacy_nvidia_driver == \d\i\s\t\r\i\b\u\t\e\d ]]
2025-12-04T09:42:53.4000329Z + [[ legacy_nvidia_driver == \s\l\o\w ]]
2025-12-04T09:42:53.4000795Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *slow-gradcheck* ]]
2025-12-04T09:42:53.4001575Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:42:53.4002230Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:42:53.4002626Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2025-12-04T09:42:53.4003007Z + [[ legacy_nvidia_driver == *crossref* ]]
2025-12-04T09:42:53.4003428Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:42:53.4003864Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *xpu* ]]
2025-12-04T09:42:53.4004327Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:42:53.4004747Z + pip_install ninja==1.10.2
2025-12-04T09:42:53.4005168Z + pip_install_pkg='python3 -m pip install --progress-bar off'
2025-12-04T09:42:53.4005716Z + python3 -m pip install --progress-bar off ninja==1.10.2
2025-12-04T09:42:53.8328753Z Collecting ninja==1.10.2
2025-12-04T09:42:53.8598345Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl.metadata (5.0 kB)
2025-12-04T09:42:53.8713637Z Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2025-12-04T09:42:54.2993211Z Installing collected packages: ninja
2025-12-04T09:42:54.2993707Z   Attempting uninstall: ninja
2025-12-04T09:42:54.3001934Z     Found existing installation: ninja 1.11.1.4
2025-12-04T09:42:54.3026174Z     Uninstalling ninja-1.11.1.4:
2025-12-04T09:42:54.3094528Z       Successfully uninstalled ninja-1.11.1.4
2025-12-04T09:42:54.3475943Z Successfully installed ninja-1.10.2
2025-12-04T09:42:54.4145433Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:42:54.4147396Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2025-12-04T09:42:54.4148709Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:42:54.4149201Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *asan* ]]
2025-12-04T09:42:54.4149774Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-debug* ]]
2025-12-04T09:42:54.4150260Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 != *-bazel-* ]]
2025-12-04T09:42:54.4150917Z + echo 'We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass'
2025-12-04T09:42:54.4151748Z We are not in debug mode: linux-jammy-cuda12.4-py3.10-gcc11. Expect the assertion to pass
2025-12-04T09:42:54.4152322Z + cd test
2025-12-04T09:42:54.4152721Z + python -c 'import torch; torch._C._crash_if_debug_asserts_fail(424242)'
2025-12-04T09:42:56.1655108Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2025-12-04T09:42:56.1655654Z + [[ legacy_nvidia_driver == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2025-12-04T09:42:56.1656189Z + [[ legacy_nvidia_driver == \l\e\g\a\c\y\_\n\v\i\d\i\a\_\d\r\i\v\e\r ]]
2025-12-04T09:42:56.1657424Z + cd test
2025-12-04T09:42:56.1658523Z + python -c 'import torch; torch.rand(2, 2, device='\''cuda'\'')'
2025-12-04T09:43:01.0771033Z + export USE_LEGACY_DRIVER=1
2025-12-04T09:43:01.0771446Z + USE_LEGACY_DRIVER=1
2025-12-04T09:43:01.0777394Z + DYNAMO_BENCHMARK_FLAGS=()
2025-12-04T09:43:01.0778560Z + [[ legacy_nvidia_driver == *pr_time_benchmarks* ]]
2025-12-04T09:43:01.0779003Z + [[ legacy_nvidia_driver == *dynamo_eager* ]]
2025-12-04T09:43:01.0779414Z + [[ legacy_nvidia_driver == *aot_eager* ]]
2025-12-04T09:43:01.0779834Z + [[ legacy_nvidia_driver == *aot_inductor* ]]
2025-12-04T09:43:01.0780255Z + [[ legacy_nvidia_driver == *max_autotune_inductor* ]]
2025-12-04T09:43:01.0780954Z + [[ legacy_nvidia_driver == *inductor* ]]
2025-12-04T09:43:01.0781355Z + [[ legacy_nvidia_driver == *dynamic* ]]
2025-12-04T09:43:01.0781736Z + [[ legacy_nvidia_driver == *cpu* ]]
2025-12-04T09:43:01.0782084Z + [[ legacy_nvidia_driver == *xpu* ]]
2025-12-04T09:43:01.0782474Z + DYNAMO_BENCHMARK_FLAGS+=(--device cuda)
2025-12-04T09:43:01.0816449Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:43:01.0816931Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *-bazel-* ]]
2025-12-04T09:43:01.0820122Z + cd test
2025-12-04T09:43:01.0821055Z + python -c 'import torch; print(torch.__config__.show())'
2025-12-04T09:43:03.8849646Z PyTorch built with:
2025-12-04T09:43:03.8849994Z   - GCC 11.4
2025-12-04T09:43:03.8850306Z   - C++ Version: 201703
2025-12-04T09:43:03.8851050Z   - Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:43:03.8851976Z   - Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:43:03.8852581Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:43:03.8852983Z   - LAPACK is enabled (usually provided by MKL)
2025-12-04T09:43:03.8853440Z   - NNPACK is enabled
2025-12-04T09:43:03.8853736Z   - CPU capability usage: AVX512
2025-12-04T09:43:03.8854131Z   - CUDA Runtime 12.4
2025-12-04T09:43:03.8854536Z   - NVCC architecture flags: -gencode;arch=compute_75,code=sm_75
2025-12-04T09:43:03.8855051Z   - CuDNN 90.1
2025-12-04T09:43:03.8861388Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, COMMIT_SHA=ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32, CUDA_VERSION=12.4, CUDNN_VERSION=9.1.0, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Werror -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, TORCH_VERSION=2.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=ON, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, USE_XCCL=OFF, USE_XPU=OFF, 
2025-12-04T09:43:03.8868119Z 
2025-12-04T09:43:04.2934604Z + cd test
2025-12-04T09:43:04.2935059Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2025-12-04T09:43:05.7493155Z ATen/Parallel:
2025-12-04T09:43:05.7493524Z 	at::get_num_threads() : 8
2025-12-04T09:43:05.7493879Z 	at::get_num_interop_threads() : 8
2025-12-04T09:43:05.7494239Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2025-12-04T09:43:05.7494600Z 	omp_get_max_threads() : 8
2025-12-04T09:43:05.7495273Z Intel(R) oneAPI Math Kernel Library Version 2024.2-Product Build 20240605 for Intel(R) 64 architecture applications
2025-12-04T09:43:05.7495978Z 	mkl_get_max_threads() : 8
2025-12-04T09:43:05.7496425Z Intel(R) MKL-DNN v3.7.1 (Git Hash 8d263e693366ef8db40acc569cc7d8edf644556d)
2025-12-04T09:43:05.7496943Z std::thread::hardware_concurrency() : 16
2025-12-04T09:43:05.7497321Z Environment variables:
2025-12-04T09:43:05.7497620Z 	OMP_NUM_THREADS : [not set]
2025-12-04T09:43:05.7497945Z 	MKL_NUM_THREADS : [not set]
2025-12-04T09:43:05.7498270Z ATen parallel backend: OpenMP
2025-12-04T09:43:05.7498485Z 
2025-12-04T09:43:06.0700478Z + [[ legacy_nvidia_driver == *numpy_2* ]]
2025-12-04T09:43:06.0701322Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *aarch64* ]]
2025-12-04T09:43:06.0701780Z + [[ legacy_nvidia_driver == *backward* ]]
2025-12-04T09:43:06.0702235Z + [[ legacy_nvidia_driver == *libtorch_agnostic_targetting* ]]
2025-12-04T09:43:06.0703025Z + [[ legacy_nvidia_driver == *xla* ]]
2025-12-04T09:43:06.0703403Z + [[ legacy_nvidia_driver == *vllm* ]]
2025-12-04T09:43:06.0703789Z + [[ legacy_nvidia_driver == *executorch* ]]
2025-12-04T09:43:06.0704199Z + [[ legacy_nvidia_driver == \j\i\t\_\l\e\g\a\c\y ]]
2025-12-04T09:43:06.0704653Z + [[ legacy_nvidia_driver == \q\u\a\n\t\i\z\a\t\i\o\n ]]
2025-12-04T09:43:06.0705133Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *libtorch* ]]
2025-12-04T09:43:06.0705565Z + [[ legacy_nvidia_driver == distributed ]]
2025-12-04T09:43:06.0706187Z + [[ legacy_nvidia_driver == *operator_benchmark* ]]
2025-12-04T09:43:06.0706823Z + [[ legacy_nvidia_driver == *operator_microbenchmark* ]]
2025-12-04T09:43:06.0707313Z + [[ legacy_nvidia_driver == *attention_microbenchmark* ]]
2025-12-04T09:43:06.0707809Z + [[ legacy_nvidia_driver == *inductor_distributed* ]]
2025-12-04T09:43:06.0708261Z + [[ legacy_nvidia_driver == *inductor-halide* ]]
2025-12-04T09:43:06.0708705Z + [[ legacy_nvidia_driver == *inductor-pallas* ]]
2025-12-04T09:43:06.0709165Z + [[ legacy_nvidia_driver == *inductor-triton-cpu* ]]
2025-12-04T09:43:06.0709655Z + [[ legacy_nvidia_driver == *inductor-micro-benchmark* ]]
2025-12-04T09:43:06.0710188Z + [[ legacy_nvidia_driver == *aoti_cross_compile_for_windows* ]]
2025-12-04T09:43:06.0710657Z + [[ legacy_nvidia_driver == *huggingface* ]]
2025-12-04T09:43:06.0711044Z + [[ legacy_nvidia_driver == *timm* ]]
2025-12-04T09:43:06.0711418Z + [[ legacy_nvidia_driver == cachebench ]]
2025-12-04T09:43:06.0711809Z + [[ legacy_nvidia_driver == verify_cachebench ]]
2025-12-04T09:43:06.0712231Z + [[ legacy_nvidia_driver == *torchbench* ]]
2025-12-04T09:43:06.0712659Z + [[ legacy_nvidia_driver == *inductor_cpp_wrapper* ]]
2025-12-04T09:43:06.0713103Z + [[ legacy_nvidia_driver == *inductor_core* ]]
2025-12-04T09:43:06.0713487Z + [[ legacy_nvidia_driver == *inductor* ]]
2025-12-04T09:43:06.0713868Z + [[ legacy_nvidia_driver == *einops* ]]
2025-12-04T09:43:06.0714257Z + [[ legacy_nvidia_driver == *dynamo_core* ]]
2025-12-04T09:43:06.0714660Z + [[ legacy_nvidia_driver == *dynamo_wrapped* ]]
2025-12-04T09:43:06.0715100Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *rocm* ]]
2025-12-04T09:43:06.0715486Z + [[ 4 == 1 ]]
2025-12-04T09:43:06.0715717Z + [[ 4 == 2 ]]
2025-12-04T09:43:06.0715970Z + [[ 4 -gt 2 ]]
2025-12-04T09:43:06.0716234Z + install_torchvision
2025-12-04T09:43:06.0716520Z + local orig_preload
2025-12-04T09:43:06.0716802Z + local commit
2025-12-04T09:43:06.0717066Z ++ get_pinned_commit vision
2025-12-04T09:43:06.0717392Z ++ cat .github/ci_commit_pins/vision.txt
2025-12-04T09:43:06.0721360Z + commit=617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:06.0721881Z + orig_preload=
2025-12-04T09:43:06.0722349Z + '[' -n '' ']'
2025-12-04T09:43:06.0722765Z + [[ linux-jammy-cuda12.4-py3.10-gcc11 == *cuda* ]]
2025-12-04T09:43:06.0723176Z + export FORCE_CUDA=1
2025-12-04T09:43:06.0723460Z + FORCE_CUDA=1
2025-12-04T09:43:06.0723702Z + export WITH_CUDA=1
2025-12-04T09:43:06.0723985Z + WITH_CUDA=1
2025-12-04T09:43:06.0724658Z + pip_build_and_install git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e dist/vision
2025-12-04T09:43:06.0725715Z + local build_target=git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:06.0726375Z + local wheel_dir=dist/vision
2025-12-04T09:43:06.0726699Z + local found_whl=0
2025-12-04T09:43:06.0726990Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:43:06.0727327Z + [[ -f dist/vision/*.whl ]]
2025-12-04T09:43:06.0727627Z + '[' 0 == 0 ']'
2025-12-04T09:43:06.0728435Z + python3 -m pip wheel --no-build-isolation --no-deps -w dist/vision git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:06.4383413Z Collecting git+https://github.com/pytorch/vision.git@617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:06.4388710Z   Cloning https://github.com/pytorch/vision.git (to revision 617079d944b0e72632311c30ae2bbdf1168b901e) to /tmp/pip-req-build-tejf4bas
2025-12-04T09:43:06.4569215Z   Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-tejf4bas
2025-12-04T09:43:08.1510177Z   Running command git rev-parse -q --verify 'sha^617079d944b0e72632311c30ae2bbdf1168b901e'
2025-12-04T09:43:08.1536390Z   Running command git fetch -q https://github.com/pytorch/vision.git 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:08.2660297Z   Resolved https://github.com/pytorch/vision.git to commit 617079d944b0e72632311c30ae2bbdf1168b901e
2025-12-04T09:43:11.6868084Z   Preparing metadata (pyproject.toml) ... [?25l- \ | done
2025-12-04T09:43:11.6906792Z [?25hBuilding wheels for collected packages: torchvision
2025-12-04T09:44:42.1779567Z   Building wheel for torchvision (pyproject.toml) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done
2025-12-04T09:44:42.1845650Z [?25h  Created wheel for torchvision: filename=torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl size=1821672 sha256=2ba3e74afda71e3592904b780596e0d10594a173250b8abb15e1f83b61107b7c
2025-12-04T09:44:42.1848082Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/12/b2/29/1f82685c5b5173629e1f36a9b93989ce92ce563e5fb91d27ac
2025-12-04T09:44:42.1889798Z Successfully built torchvision
2025-12-04T09:44:42.2796173Z + for file in "${wheel_dir}"/*.whl
2025-12-04T09:44:42.2796881Z + pip_install_whl dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:44:42.2797723Z + args=('dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl')
2025-12-04T09:44:42.2798279Z + local args
2025-12-04T09:44:42.2798744Z + [[ dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl == *\ * ]]
2025-12-04T09:44:42.2799323Z + for path in "${args[@]}"
2025-12-04T09:44:42.2799877Z + echo 'Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl'
2025-12-04T09:44:42.2800694Z Installing dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:44:42.2801785Z + python3 -mpip install --no-index --no-deps dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:44:42.6472220Z Processing ./dist/vision/torchvision-0.25.0a0+617079d-cp310-cp310-linux_x86_64.whl
2025-12-04T09:44:42.6612264Z Installing collected packages: torchvision
2025-12-04T09:44:43.1845660Z Successfully installed torchvision-0.25.0a0+617079d
2025-12-04T09:44:43.2300240Z + '[' -n '' ']'
2025-12-04T09:44:43.2300559Z + test_python_shard 4
2025-12-04T09:44:43.2301042Z + [[ -z 5 ]]
2025-12-04T09:44:43.2302030Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --exclude-quantization-tests --shard 4 5 --verbose --upload-artifacts-while-running
2025-12-04T09:44:50.1796820Z Downloading https://ossci-metrics.s3.amazonaws.com/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2025-12-04T09:44:50.2256507Z Ignoring disabled issues:  ['']
2025-12-04T09:44:50.2371562Z Found test times from artifacts
2025-12-04T09:44:50.2817382Z Found test times from artifacts
2025-12-04T09:44:50.2833235Z Running all tests
2025-12-04T09:44:50.3740250Z Running parallel tests on 1 processes
2025-12-04T09:44:50.3752201Z Name: tests to run (est. time: 289.55min)
2025-12-04T09:44:50.3752598Z   Serial tests (139):
2025-12-04T09:44:50.3752911Z     inductor/test_aot_inductor 4/6
2025-12-04T09:44:50.3753373Z     inductor/test_torchinductor_dynamic_shapes 1/5
2025-12-04T09:44:50.3753861Z     inductor/test_torchinductor_dynamic_shapes 5/5
2025-12-04T09:44:50.3754271Z     inductor/test_kernel_benchmark 1/1
2025-12-04T09:44:50.3754664Z     inductor/test_torchinductor_opinfo 3/17
2025-12-04T09:44:50.3755073Z     inductor/test_torchinductor_opinfo 8/17
2025-12-04T09:44:50.3755469Z     inductor/test_torchinductor_opinfo 13/17
2025-12-04T09:44:50.3755866Z     inductor/test_pattern_matcher 1/1
2025-12-04T09:44:50.3756231Z     inductor/test_cuda_repro 1/1
2025-12-04T09:44:50.3756856Z     inductor/test_cudagraph_trees 1/1
2025-12-04T09:44:50.3757244Z     inductor/test_cuda_select_algorithm 4/5
2025-12-04T09:44:50.3757635Z     inductor/test_deterministic 1/8
2025-12-04T09:44:50.3757981Z     inductor/test_deterministic 6/8
2025-12-04T09:44:50.3758346Z     inductor/test_extension_backend 1/1
2025-12-04T09:44:50.3758723Z     inductor/test_native_matmul 1/2
2025-12-04T09:44:50.3759084Z     dynamo/test_fx_graph_runnable 1/1
2025-12-04T09:44:50.3759430Z     inductor/test_memory 1/1
2025-12-04T09:44:50.3759904Z     dynamo/test_streams 1/1
2025-12-04T09:44:50.3760235Z     inductor/test_unbacked_symints 1/1
2025-12-04T09:44:50.3760611Z     inductor/test_scatter_optimization 1/1
2025-12-04T09:44:50.3761004Z     inductor/test_mix_order_reduction 1/2
2025-12-04T09:44:50.3761374Z     test_transformers 1/1
2025-12-04T09:44:50.3761663Z     test_autograd 1/1
2025-12-04T09:44:50.3762019Z     test_sparse 1/2
2025-12-04T09:44:50.3762287Z     test_decomp 2/17
2025-12-04T09:44:50.3762555Z     test_decomp 7/17
2025-12-04T09:44:50.3762834Z     test_decomp 12/17
2025-12-04T09:44:50.3763113Z     test_decomp 17/17
2025-12-04T09:44:50.3763377Z     test_meta 5/5
2025-12-04T09:44:50.3763650Z     test_nestedtensor 1/4
2025-12-04T09:44:50.3763963Z     test_nestedtensor 4/4
2025-12-04T09:44:50.3764247Z     test_ops 5/11
2025-12-04T09:44:50.3764515Z     test_ops 10/11
2025-12-04T09:44:50.3764795Z     functorch/test_ops 2/7
2025-12-04T09:44:50.3765100Z     functorch/test_ops 7/7
2025-12-04T09:44:50.3765428Z     inductor/test_max_autotune 1/1
2025-12-04T09:44:50.3765788Z     inductor/test_cpu_repro 3/3
2025-12-04T09:44:50.3766146Z     inductor/test_mkldnn_pattern_matcher 2/3
2025-12-04T09:44:50.3766534Z     inductor/test_cpu_select_algorithm 1/1
2025-12-04T09:44:50.3766901Z     test_custom_ops 1/1
2025-12-04T09:44:50.3767200Z     inductor/test_analysis 1/1
2025-12-04T09:44:50.3767519Z     inductor/test_pad_mm 1/1
2025-12-04T09:44:50.3767848Z     inductor/test_triton_syntax 1/1
2025-12-04T09:44:50.3768233Z     inductor/test_triton_extension_backend 1/1
2025-12-04T09:44:50.3768622Z     test_sparse_semi_structured 1/1
2025-12-04T09:44:50.3768987Z     inductor/test_op_completeness 1/1
2025-12-04T09:44:50.3769359Z     inductor/test_subgraph_choice 1/1
2025-12-04T09:44:50.3769720Z     inductor/test_cutedsl_grouped_mm 1/1
2025-12-04T09:44:50.3770106Z     inductor/test_cpp_wrapper_hipify 1/1
2025-12-04T09:44:50.3770487Z     inductor/test_inductor_utils 1/1
2025-12-04T09:44:50.3770878Z     inductor/test_template_heuristics_registry 1/1
2025-12-04T09:44:50.3771301Z     inductor/test_async_compile 1/1
2025-12-04T09:44:50.3771662Z     dynamo/test_deque_reconstruct 1/1
2025-12-04T09:44:50.3772020Z     inductor/test_utils 1/1
2025-12-04T09:44:50.3772327Z     inductor/test_indexing 1/1
2025-12-04T09:44:50.3772676Z     inductor/test_inductor_annotations 1/1
2025-12-04T09:44:50.3773059Z     inductor/test_compile_worker 1/1
2025-12-04T09:44:50.3773398Z     dynamo/test_einops 1/1
2025-12-04T09:44:50.3773731Z     inductor/test_external_callables 1/1
2025-12-04T09:44:50.3774092Z     test_testing 1/1
2025-12-04T09:44:50.3774378Z     dynamo/test_fx_passes_pre_grad 1/1
2025-12-04T09:44:50.3774748Z     export/test_strict_export_v2 1/1
2025-12-04T09:44:50.3775138Z     export/test_functionalized_assertions 1/1
2025-12-04T09:44:50.3775529Z     inductor/test_selective_lowering 1/1
2025-12-04T09:44:50.3775907Z     dynamo/test_base_output 1/1
2025-12-04T09:44:50.3776248Z     inductor/test_lookup_table 1/1
2025-12-04T09:44:50.3776602Z     export/test_serialize 1/1
2025-12-04T09:44:50.3776960Z     inductor/test_move_constructors_to_gpu 1/1
2025-12-04T09:44:50.3777353Z     inductor/test_remote_cache 1/1
2025-12-04T09:44:50.3777726Z     inductor/test_coordinate_descent_tuner 1/1
2025-12-04T09:44:50.3778110Z     inductor/test_inplace_padding 1/1
2025-12-04T09:44:50.3778479Z     inductor/test_cudacodecache 1/1
2025-12-04T09:44:50.3778841Z     inductor/test_minifier_utils 1/1
2025-12-04T09:44:50.3779183Z     inductor/test_debug_trace 1/1
2025-12-04T09:44:50.3779644Z     inductor/test_foreach 1/1
2025-12-04T09:44:50.3779975Z     inductor/test_cache 1/1
2025-12-04T09:44:50.3780278Z     dynamo/test_config 1/1
2025-12-04T09:44:50.3780609Z     dynamo/test_metrics_context 1/1
2025-12-04T09:44:50.3780965Z     export/test_package 1/1
2025-12-04T09:44:50.3781270Z     dynamo/test_nops 1/1
2025-12-04T09:44:50.3781645Z     inductor/test_graph_transform_observer 1/1
2025-12-04T09:44:50.3782130Z     export/test_db 1/1
2025-12-04T09:44:50.3782477Z     dynamo/test_export_mutations 1/1
2025-12-04T09:44:50.3782971Z     inductor/test_config 1/1
2025-12-04T09:44:50.3783311Z     inductor/test_dependencies 1/1
2025-12-04T09:44:50.3783665Z     inductor/test_fuzzer 1/1
2025-12-04T09:44:50.3783974Z     dynamo/test_global 1/1
2025-12-04T09:44:50.3784296Z     inductor/test_control_flow 1/4
2025-12-04T09:44:50.3784650Z     dynamo/test_cudagraphs 1/1
2025-12-04T09:44:50.3784974Z     inductor/test_alignment 1/1
2025-12-04T09:44:50.3785313Z     dynamo/test_profiler 1/1
2025-12-04T09:44:50.3785664Z     dynamo/test_guard_serialization 1/1
2025-12-04T09:44:50.3786019Z     dynamo/test_dicts 1/1
2025-12-04T09:44:50.3786336Z     dynamo/test_optimizers 1/1
2025-12-04T09:44:50.3786670Z     export/test_torchbind 1/1
2025-12-04T09:44:50.3787007Z     dynamo/test_python_dispatcher 1/1
2025-12-04T09:44:50.3787365Z     export/test_swap 1/1
2025-12-04T09:44:50.3787672Z     export/test_unflatten 1/1
2025-12-04T09:44:50.3788001Z     dynamo/test_verify_correctness 1/1
2025-12-04T09:44:50.3788375Z     inductor/test_fxir_backend 1/1
2025-12-04T09:44:50.3788740Z     dynamo/test_structured_trace 1/1
2025-12-04T09:44:50.3789101Z     dynamo/test_torchrec 1/1
2025-12-04T09:44:50.3789423Z     test_model_exports_to_core_aten 1/1
2025-12-04T09:44:50.3789800Z     dynamo/test_precompile_context 1/1
2025-12-04T09:44:50.3790168Z     dynamo/test_trace_rules 1/1
2025-12-04T09:44:50.3790482Z     export/test_upgrader 1/1
2025-12-04T09:44:50.3790797Z     dynamo/test_hooks 1/1
2025-12-04T09:44:50.3791107Z     dynamo/test_generator 1/1
2025-12-04T09:44:50.3791419Z     export/test_verifier 1/1
2025-12-04T09:44:50.3791739Z     export/test_sparse 2/2
2025-12-04T09:44:50.3792054Z     functorch/test_ac 1/1
2025-12-04T09:44:50.3792346Z     test_out_dtype_op 1/1
2025-12-04T09:44:50.3792664Z     torch_np/test_ufuncs_basic 1/1
2025-12-04T09:44:50.3793023Z     lazy/test_step_closures 1/1
2025-12-04T09:44:50.3793388Z     functorch/dim/test_getsetitem 1/1
2025-12-04T09:44:50.3793872Z     test_fx 1/1
2025-12-04T09:44:50.3794137Z     test_autocast 1/1
2025-12-04T09:44:50.3794421Z     test_logging 1/1
2025-12-04T09:44:50.3794693Z     test_python_dispatch 1/1
2025-12-04T09:44:50.3795011Z     nn/test_lazy_modules 1/1
2025-12-04T09:44:50.3795321Z     nn/test_pruning 1/1
2025-12-04T09:44:50.3795591Z     test_monitor 1/1
2025-12-04T09:44:50.3795873Z     test_cuda_sanitizer 1/1
2025-12-04T09:44:50.3796191Z     test_bundled_inputs 1/1
2025-12-04T09:44:50.3796524Z     torch_np/numpy_tests/core/test_numeric 1/1
2025-12-04T09:44:50.3796959Z     torch_np/numpy_tests/core/test_multiarray 1/1
2025-12-04T09:44:50.3797349Z     test_itt 1/1
2025-12-04T09:44:50.3797657Z     torch_np/numpy_tests/lib/test_function_base 1/1
2025-12-04T09:44:50.3798056Z     test_masked 1/1
2025-12-04T09:44:50.3798337Z     optim/test_lrscheduler 1/1
2025-12-04T09:44:50.3798647Z     test_datapipe 1/1
2025-12-04T09:44:50.3798938Z     nn/test_convolution 1/1
2025-12-04T09:44:50.3799244Z     test_indexing 1/1
2025-12-04T09:44:50.3799552Z     torch_np/numpy_tests/fft/test_pocketfft 1/1
2025-12-04T09:44:50.3799993Z     torch_np/numpy_tests/lib/test_shape_base_ 1/1
2025-12-04T09:44:50.3800400Z     test_cpp_extensions_jit 1/1
2025-12-04T09:44:50.3800746Z     profiler/test_python_tracer 1/1
2025-12-04T09:44:50.3801638Z     cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1
2025-12-04T09:44:50.3802303Z     distributions/test_distributions 1/1
2025-12-04T09:44:50.3802682Z   Parallel tests (0):
2025-12-04T09:44:50.3802974Z Name: excluded (est. time: 0.0min)
2025-12-04T09:44:50.3803478Z   Serial tests (0):
2025-12-04T09:44:50.3803757Z   Parallel tests (0):
2025-12-04T09:44:50.3804242Z Running inductor/test_aot_inductor 4/6 ... [2025-12-04 09:44:50.376192][1847.986095395]
2025-12-04T09:44:50.3804818Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:44:50.3806093Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_aot_inductor.py', '--shard-id=4', '--num-shards=6', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:44:50.376617]
2025-12-04T09:53:09.3152687Z 
2025-12-04T09:53:09.3153576Z PRINTING LOG FILE of inductor/test_aot_inductor 4/6 (test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log)
2025-12-04T09:53:09.3154711Z W1204 09:45:03.012000 1725 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.3155905Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml
2025-12-04T09:53:09.3156801Z ============================= test session starts ==============================
2025-12-04T09:53:09.3157468Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:53:09.3158076Z cachedir: .pytest_cache
2025-12-04T09:53:09.3158800Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:53:09.3159665Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:53:09.3160045Z configfile: pytest.ini
2025-12-04T09:53:09.3160869Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:53:09.3161771Z collecting ... collected 934 items
2025-12-04T09:53:09.3162245Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T09:53:09.3250342Z Running 152 items in this shard: test/inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicate_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_on_disk_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps, test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps
2025-12-04T09:53:09.3337086Z 
2025-12-04T09:53:09.3338302Z inductor/test_aot_inductor.py::TestAOTInductorConfig::test_compile_standalone_sets_package_cpp W1204 09:45:04.906000 1725 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.link_libtorch=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T09:53:09.3340340Z W1204 09:45:04.906000 1725 site-packages/torch/_inductor/utils.py:3815] Overriding: aot_inductor.dynamic_linkage=False when aot_inductor_mode.compile_standalone is True.
2025-12-04T09:53:09.3341313Z PASSED [0.0050s] [  0%]
2025-12-04T09:53:09.3342255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_aot_inductor_consts_cpp_build_cpu <- test/inductor/test_torchinductor.py PASSED [18.5029s] [  1%]
2025-12-04T09:53:09.3343661Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_backward_no_op_logging_cpu PASSED [0.0066s] [  1%]
2025-12-04T09:53:09.3345149Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_bmm_multiple_dynamic_cpu SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [  2%]
2025-12-04T09:53:09.3347363Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_non_tensor_predicates_dynamic_True_cpu W1204 09:45:23.438000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3349512Z W1204 09:45:23.438000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3350410Z PASSED [5.5119s] [  3%]
2025-12-04T09:53:09.3351304Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_share_predicate_cpu <- test/inductor/test_torchinductor.py PASSED [5.4373s] [  3%]
2025-12-04T09:53:09.3353408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_unbacked_symint_closure_dynamic_True_cpu W1204 09:45:34.390000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3355534Z W1204 09:45:34.391000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3356979Z W1204 09:45:34.391000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3357881Z PASSED [5.6927s] [  4%]
2025-12-04T09:53:09.3359471Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_use_buffers_from_outer_scope_cpu <- test/inductor/test_torchinductor.py W1204 09:45:40.158000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3361673Z W1204 09:45:40.158000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3363205Z W1204 09:45:40.159000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3364096Z PASSED [5.5548s] [  5%]
2025-12-04T09:53:09.3365732Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_cond_with_reinterpret_view_inputs_outputs_cpu <- test/inductor/test_torchinductor.py W1204 09:45:45.760000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3367965Z W1204 09:45:45.760000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3368872Z PASSED [5.8295s] [  5%]
2025-12-04T09:53:09.3369728Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_constant_folding_cpu <- test/inductor/test_torchinductor.py PASSED [7.0993s] [  6%]
2025-12-04T09:53:09.3371194Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fft_c2c_cpu <- test/inductor/test_torchinductor.py PASSED [5.0658s] [  7%]
2025-12-04T09:53:09.3372500Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_foreach_multiple_dynamic_cpu PASSED [5.1488s] [  7%]
2025-12-04T09:53:09.3373924Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fp8_view_of_param_cpu SKIPPED [0.0003s] (FP8 is only supported on H100+, SM 8.9 and MI300+ devices) [  8%]
2025-12-04T09:53:09.3375444Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_freezing_cpu <- test/inductor/test_torchinductor.py PASSED [5.1422s] [  9%]
2025-12-04T09:53:09.3376963Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_fx_gm_return_tuple_validation_cpu <- test/inductor/test_torchinductor.py PASSED [0.0282s] [  9%]
2025-12-04T09:53:09.3378660Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_input_codegen_with_sympy_expr_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0030s] (requires GPU) [ 10%]
2025-12-04T09:53:09.3380095Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_issue_140766_cpu PASSED [8.1427s] [ 11%]
2025-12-04T09:53:09.3381443Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_large_grid_cpu <- test/inductor/test_torchinductor.py SKIPPED [0.0032s] (requires GPU) [ 11%]
2025-12-04T09:53:09.3383225Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_linear_dynamic_maxautotune_cpu SKIPPED [0.0002s] (Skipping triton backend only since not big GPU (not enough SM)) [ 12%]
2025-12-04T09:53:09.3385270Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_multi_device_cpu <- test/inductor/test_torchinductor.py W1204 09:46:22.519000 1725 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T09:53:09.3386555Z PASSED [10.1890s] [ 13%]
2025-12-04T09:53:09.3387821Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_nested_tensor_from_jagged_cpu <- test/inductor/test_torchinductor.py W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T09:53:09.3389436Z W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T09:53:09.3390293Z W1204 09:46:37.583000 1725 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T09:53:09.3392015Z W1204 09:46:37.584000 1725 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T09:53:09.3394647Z W1204 09:46:37.585000 1725 site-packages/torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile.
2025-12-04T09:53:09.3396164Z PASSED [8.7186s] [ 13%]
2025-12-04T09:53:09.3397022Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_non_default_gpu_device_cpu SKIPPED [0.0003s] (requires multiple cuda devices) [ 14%]
2025-12-04T09:53:09.3398540Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_normal_functional_cpu <- test/inductor/test_torchinductor.py PASSED [4.9339s] [ 15%]
2025-12-04T09:53:09.3400146Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_output_path_2_cpu <- test/inductor/test_torchinductor.py PASSED [5.0270s] [ 15%]
2025-12-04T09:53:09.3401811Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_proxy_executor_abs_cpu <- test/inductor/test_torchinductor.py PASSED [4.9238s] [ 16%]
2025-12-04T09:53:09.3403366Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_repeat_output_cpu <- test/inductor/test_torchinductor.py PASSED [4.9819s] [ 17%]
2025-12-04T09:53:09.3404895Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_return_view_constant_cpu <- test/inductor/test_torchinductor.py PASSED [4.9382s] [ 17%]
2025-12-04T09:53:09.3406431Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_scatter_fallback_cpu <- test/inductor/test_torchinductor.py PASSED [5.4949s] [ 18%]
2025-12-04T09:53:09.3407946Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_dynamic_cpu <- test/inductor/test_torchinductor.py PASSED [5.1186s] [ 19%]
2025-12-04T09:53:09.3409244Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_simple_split_cpu PASSED [5.0945s] [ 19%]
2025-12-04T09:53:09.3410512Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_subclasses_cpu <- test/inductor/test_torchinductor.py PASSED [5.0660s] [ 20%]
2025-12-04T09:53:09.3411976Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_symbool_item_cpu <- test/inductor/test_torchinductor.py PASSED [5.0445s] [ 21%]
2025-12-04T09:53:09.3413492Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_equal_to_1_float_arg_dynamic_False_cpu SKIPPED [0.0031s] (requires GPU) [ 21%]
2025-12-04T09:53:09.3415133Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_1_dynamic_False_autotune_True_cpu SKIPPED [0.0030s] (requires GPU) [ 22%]
2025-12-04T09:53:09.3416944Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_False_cpu SKIPPED [0.0030s] (requires GPU) [ 23%]
2025-12-04T09:53:09.3418672Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_cpu SKIPPED [0.0028s] (requires GPU) [ 23%]
2025-12-04T09:53:09.3420372Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_2_num_dims_2_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 24%]
2025-12-04T09:53:09.3422156Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_True_cpu SKIPPED [0.0028s] (requires GPU) [ 25%]
2025-12-04T09:53:09.3423856Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_1_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 25%]
2025-12-04T09:53:09.3425578Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_grid_type_3_num_dims_2_dynamic_True_autotune_True_cpu SKIPPED [0.0027s] (requires GPU) [ 26%]
2025-12-04T09:53:09.3427255Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_on_device_tma_dynamic_False_tma_version_new_cpu SKIPPED [0.0027s] (requires GPU) [ 26%]
2025-12-04T09:53:09.3428923Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_1d_dynamic_False_tma_version_old_cpu SKIPPED [0.0027s] (requires GPU) [ 27%]
2025-12-04T09:53:09.3430618Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cpu SKIPPED [0.0030s] (requires GPU) [ 28%]
2025-12-04T09:53:09.3432310Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_old_cpu SKIPPED [0.0027s] (requires GPU) [ 28%]
2025-12-04T09:53:09.3434026Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_False_autotuning_True_cpu SKIPPED [0.0027s] (requires GPU) [ 29%]
2025-12-04T09:53:09.3435785Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cpu Error: Expected u1 >= 1 but received 0
2025-12-04T09:53:09.3436829Z PASSED [10.3166s] [ 30%]
2025-12-04T09:53:09.3437914Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_cpu SKIPPED [0.0031s] (Need triton for user-defined triton kernel) [ 30%]
2025-12-04T09:53:09.3439835Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_cpu SKIPPED [0.0029s] (Need triton for user-defined triton kernel) [ 31%]
2025-12-04T09:53:09.3441762Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_unbacked_expr_replacements_shift_k_3_use_static_size_False_cpu SKIPPED [0.0028s] (Need triton for user-defined triton kernel) [ 32%]
2025-12-04T09:53:09.3443581Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_using_model_name_for_files_cpu <- test/inductor/test_torchinductor.py PASSED [5.0463s] [ 32%]
2025-12-04T09:53:09.3445143Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_weight_on_disk_legacy_cpu <- test/inductor/test_torchinductor.py PASSED [5.2486s] [ 33%]
2025-12-04T09:53:09.3446603Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_False_cpu PASSED [5.7203s] [ 34%]
2025-12-04T09:53:09.3448596Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleCpu::test_while_loop_with_sym_expr_cond_dynamic_True_cpu W1204 09:47:58.251000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3450731Z W1204 09:47:58.251000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3451639Z PASSED [5.7703s] [ 34%]
2025-12-04T09:53:09.3452590Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_autotune_with_constant_folding_cuda <- test/inductor/test_torchinductor.py PASSED [6.7599s] [ 35%]
2025-12-04T09:53:09.3454193Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_clamp_decomposition_cuda <- test/inductor/test_torchinductor.py PASSED [11.5150s] [ 36%]
2025-12-04T09:53:09.3455893Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_composed_dynamic_size_cuda <- test/inductor/test_torchinductor.py PASSED [6.3204s] [ 36%]
2025-12-04T09:53:09.3457455Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_share_predicate_cuda <- test/inductor/test_torchinductor.py PASSED [6.2938s] [ 37%]
2025-12-04T09:53:09.3459902Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_simple_cuda <- test/inductor/test_torchinductor.py W1204 09:48:34.826000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3462014Z W1204 09:48:34.826000 1725 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.3462912Z PASSED [6.4230s] [ 38%]
2025-12-04T09:53:09.3463789Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_symint_input_cuda <- test/inductor/test_torchinductor.py PASSED [6.5759s] [ 38%]
2025-12-04T09:53:09.3465252Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_cond_unbacked_symint_closure_dynamic_False_cuda PASSED [6.2088s] [ 39%]
2025-12-04T09:53:09.3467178Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_constant_folding_cuda <- test/inductor/test_torchinductor.py W1204 09:48:55.538000 1725 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.3468554Z PASSED [6.9521s] [ 40%]
2025-12-04T09:53:09.3469408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_conv_freezing_cuda <- test/inductor/test_torchinductor.py PASSED [12.8475s] [ 40%]
2025-12-04T09:53:09.3471222Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_d2h_copy_cuda <- test/inductor/test_torchinductor.py W1204 09:49:13.890000 1725 site-packages/torch/_inductor/utils.py:2565] DeviceCopy in input program
2025-12-04T09:53:09.3472464Z PASSED [6.4041s] [ 41%]
2025-12-04T09:53:09.3473478Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1789s] [ 42%]
2025-12-04T09:53:09.3475260Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1410s] [ 42%]
2025-12-04T09:53:09.3476954Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1403s] [ 42%]
2025-12-04T09:53:09.3477828Z 
2025-12-04T09:53:09.3477978Z ==================================== RERUNS ====================================
2025-12-04T09:53:09.3478609Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3479197Z Traceback (most recent call last):
2025-12-04T09:53:09.3479853Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3480511Z     return value(self)
2025-12-04T09:53:09.3481186Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3481957Z     self.check_model(model, inps)
2025-12-04T09:53:09.3482695Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3483499Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3484113Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3484798Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3485486Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3486230Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3487103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3487973Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3488786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3489561Z     raise e
2025-12-04T09:53:09.3490252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3491038Z     return func(
2025-12-04T09:53:09.3491757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3492671Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3493511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3494229Z     return compile_fx_aot(
2025-12-04T09:53:09.3494922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3495685Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3496408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3497133Z     return compile_fx(
2025-12-04T09:53:09.3497782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3498536Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3499387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3500206Z     return _compile_fx_main(
2025-12-04T09:53:09.3501094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3501963Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3502835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3503643Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3504439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3505214Z     return compile_fx_forward(
2025-12-04T09:53:09.3505952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3506711Z     return inner_compile(
2025-12-04T09:53:09.3507191Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3507730Z     return func(*args, **kwds)
2025-12-04T09:53:09.3508438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3509359Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3510263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3511084Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3512030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3512883Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3513708Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3514507Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3515330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3516411Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3517407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3518200Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3519011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3519821Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3520557Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3521328Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3521857Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3522311Z 
2025-12-04T09:53:09.3522529Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3523544Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3524329Z 
2025-12-04T09:53:09.3524612Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3525254Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3525718Z unimplemented []
2025-12-04T09:53:09.3526048Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3526443Z inductor []
2025-12-04T09:53:09.3526676Z graph_break []
2025-12-04T09:53:09.3527049Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3528235Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3529294Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3530264Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3531236Z   warnings.warn(
2025-12-04T09:53:09.3531736Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3532323Z Traceback (most recent call last):
2025-12-04T09:53:09.3532969Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3533626Z     return value(self)
2025-12-04T09:53:09.3534300Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3535069Z     self.check_model(model, inps)
2025-12-04T09:53:09.3535737Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3536438Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3537044Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3537723Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3538407Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3539148Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3540099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3540905Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3541714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3542484Z     raise e
2025-12-04T09:53:09.3543175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3543956Z     return func(
2025-12-04T09:53:09.3544741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3545657Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3546495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3547211Z     return compile_fx_aot(
2025-12-04T09:53:09.3547904Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3548674Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3549399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3550123Z     return compile_fx(
2025-12-04T09:53:09.3550776Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3551530Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3552376Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3553201Z     return _compile_fx_main(
2025-12-04T09:53:09.3553920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3554776Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3555642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3556449Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3557248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3558018Z     return compile_fx_forward(
2025-12-04T09:53:09.3558756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3559523Z     return inner_compile(
2025-12-04T09:53:09.3560003Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3560545Z     return func(*args, **kwds)
2025-12-04T09:53:09.3561249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3562228Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3563146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3563968Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3564777Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3565632Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3566481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3567281Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3568085Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3569189Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3570192Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3570976Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3571785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3572610Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3573410Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3574174Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3574701Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3575088Z 
2025-12-04T09:53:09.3575319Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3576331Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3577121Z 
2025-12-04T09:53:09.3577392Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3578029Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3578508Z unimplemented []
2025-12-04T09:53:09.3578826Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3579221Z inductor []
2025-12-04T09:53:09.3579471Z graph_break []
2025-12-04T09:53:09.3579833Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3581020Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3582094Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3583063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3584032Z   warnings.warn(
2025-12-04T09:53:09.3584418Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3584894Z unimplemented []
2025-12-04T09:53:09.3585219Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3585601Z inductor []
2025-12-04T09:53:09.3585844Z graph_break []
2025-12-04T09:53:09.3586222Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3587392Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3588467Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3589427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3590405Z   warnings.warn(
2025-12-04T09:53:09.3590708Z =================================== FAILURES ===================================
2025-12-04T09:53:09.3591337Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3591937Z Traceback (most recent call last):
2025-12-04T09:53:09.3592564Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3593218Z     return value(self)
2025-12-04T09:53:09.3593907Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3594675Z     self.check_model(model, inps)
2025-12-04T09:53:09.3595331Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3596042Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3596743Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3597411Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3598101Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3598863Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3599737Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3600595Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3601886Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3602851Z     raise e
2025-12-04T09:53:09.3603525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3604311Z     return func(
2025-12-04T09:53:09.3605038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3605972Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3606799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3607520Z     return compile_fx_aot(
2025-12-04T09:53:09.3608225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3608980Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3609705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3610426Z     return compile_fx(
2025-12-04T09:53:09.3611079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3611827Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3612674Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3613502Z     return _compile_fx_main(
2025-12-04T09:53:09.3614220Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3615054Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3615919Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3616738Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3617520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3618292Z     return compile_fx_forward(
2025-12-04T09:53:09.3619036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3619811Z     return inner_compile(
2025-12-04T09:53:09.3620283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3620826Z     return func(*args, **kwds)
2025-12-04T09:53:09.3621546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3622443Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3623359Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3624180Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3624997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3625993Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3626857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3627665Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3628491Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3629486Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3630576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3631369Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3632206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3633058Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3633788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3634564Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3635098Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3635488Z 
2025-12-04T09:53:09.3635708Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3636719Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3637527Z 
2025-12-04T09:53:09.3637796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3638442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3638906Z unimplemented []
2025-12-04T09:53:09.3639245Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3639646Z inductor []
2025-12-04T09:53:09.3639885Z graph_break []
2025-12-04T09:53:09.3640265Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3641449Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3642592Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3643540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3644522Z   warnings.warn(
2025-12-04T09:53:09.3644912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3645370Z unimplemented []
2025-12-04T09:53:09.3645700Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3646096Z inductor []
2025-12-04T09:53:09.3646350Z graph_break []
2025-12-04T09:53:09.3646710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3647886Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3648955Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3649903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3650880Z   warnings.warn(
2025-12-04T09:53:09.3651264Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3651734Z unimplemented []
2025-12-04T09:53:09.3652049Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3652443Z inductor []
2025-12-04T09:53:09.3652691Z graph_break []
2025-12-04T09:53:09.3653053Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3654311Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3655384Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3656338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3657296Z   warnings.warn(
2025-12-04T09:53:09.3658297Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml -
2025-12-04T09:53:09.3659362Z =========================== short test summary info ============================
2025-12-04T09:53:09.3660542Z FAILED [0.1403s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3661518Z 
2025-12-04T09:53:09.3661737Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3662743Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3663542Z 
2025-12-04T09:53:09.3663809Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3664401Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:53:09.3664950Z ======== 1 failed, 41 passed, 22 skipped, 2 rerun in 255.81s (0:04:15) =========
2025-12-04T09:53:09.3665430Z Got exit code 1
2025-12-04T09:53:09.3665708Z Retrying single test...
2025-12-04T09:53:09.3666327Z W1204 09:49:32.739000 6810 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.3667492Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml
2025-12-04T09:53:09.3668374Z ============================= test session starts ==============================
2025-12-04T09:53:09.3669044Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:53:09.3669644Z cachedir: .pytest_cache
2025-12-04T09:53:09.3670362Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:53:09.3671149Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:53:09.3671493Z configfile: pytest.ini
2025-12-04T09:53:09.3672229Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:53:09.3673142Z collecting ... collected 934 items / 151 deselected / 783 selected
2025-12-04T09:53:09.3674249Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3675224Z Running 1 items in this shard
2025-12-04T09:53:09.3675450Z 
2025-12-04T09:53:09.3676280Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.9725s] [100%]
2025-12-04T09:53:09.3678079Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1906s] [100%]
2025-12-04T09:53:09.3679776Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1403s] [100%]
2025-12-04T09:53:09.3680642Z 
2025-12-04T09:53:09.3680798Z ==================================== RERUNS ====================================
2025-12-04T09:53:09.3681412Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3682077Z Traceback (most recent call last):
2025-12-04T09:53:09.3682825Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3683471Z     return value(self)
2025-12-04T09:53:09.3684161Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3684932Z     self.check_model(model, inps)
2025-12-04T09:53:09.3685795Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3686638Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3687259Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3687947Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3688620Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3689382Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3690263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3691075Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3691874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3692655Z     raise e
2025-12-04T09:53:09.3693347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3694136Z     return func(
2025-12-04T09:53:09.3694844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3695773Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3696617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3697324Z     return compile_fx_aot(
2025-12-04T09:53:09.3698030Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3698803Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3699530Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3700245Z     return compile_fx(
2025-12-04T09:53:09.3701072Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3701833Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3702677Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3703525Z     return _compile_fx_main(
2025-12-04T09:53:09.3704265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3705127Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3705988Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3706807Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3707605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3708385Z     return compile_fx_forward(
2025-12-04T09:53:09.3709250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3710038Z     return inner_compile(
2025-12-04T09:53:09.3710523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3711052Z     return func(*args, **kwds)
2025-12-04T09:53:09.3711931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3712853Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3713773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3714575Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3715399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3716336Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3717165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3717976Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3718808Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3719813Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3720799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3721590Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3722448Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3723279Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3723996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3724770Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3725297Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3725685Z 
2025-12-04T09:53:09.3725904Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3726918Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3727724Z 
2025-12-04T09:53:09.3727997Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3728642Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3729105Z unimplemented []
2025-12-04T09:53:09.3729438Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3729837Z inductor []
2025-12-04T09:53:09.3730065Z graph_break []
2025-12-04T09:53:09.3730443Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3731649Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3732731Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3733683Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3734661Z   warnings.warn(
2025-12-04T09:53:09.3735162Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3735763Z Traceback (most recent call last):
2025-12-04T09:53:09.3736397Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3737049Z     return value(self)
2025-12-04T09:53:09.3737737Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3738496Z     self.check_model(model, inps)
2025-12-04T09:53:09.3739165Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3739970Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3740598Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3741270Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3741960Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3742723Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3743581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3744446Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3745256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3746042Z     raise e
2025-12-04T09:53:09.3746722Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3747517Z     return func(
2025-12-04T09:53:09.3748237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3749163Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3750004Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3750723Z     return compile_fx_aot(
2025-12-04T09:53:09.3751427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3752177Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3752903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3753628Z     return compile_fx(
2025-12-04T09:53:09.3754277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3755034Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3755883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3756719Z     return _compile_fx_main(
2025-12-04T09:53:09.3757433Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3758285Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3759158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3759974Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3760757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3761528Z     return compile_fx_forward(
2025-12-04T09:53:09.3762331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3763097Z     return inner_compile(
2025-12-04T09:53:09.3763590Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3764134Z     return func(*args, **kwds)
2025-12-04T09:53:09.3764864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3765772Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3766681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3767499Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3768385Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3769236Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3770075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3770877Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3771691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3772781Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3773779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3774579Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3775390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3776224Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3776958Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3777734Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3778250Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3778650Z 
2025-12-04T09:53:09.3778869Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3779887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3780678Z 
2025-12-04T09:53:09.3780946Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3781591Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3782072Z unimplemented []
2025-12-04T09:53:09.3782415Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3782800Z inductor []
2025-12-04T09:53:09.3783047Z graph_break []
2025-12-04T09:53:09.3783429Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3784606Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3785682Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3786657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3787630Z   warnings.warn(
2025-12-04T09:53:09.3787999Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3788472Z unimplemented []
2025-12-04T09:53:09.3788799Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3789184Z inductor []
2025-12-04T09:53:09.3789433Z graph_break []
2025-12-04T09:53:09.3789801Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3790974Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3792025Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3792983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3793960Z   warnings.warn(
2025-12-04T09:53:09.3794265Z =================================== FAILURES ===================================
2025-12-04T09:53:09.3794898Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3795495Z Traceback (most recent call last):
2025-12-04T09:53:09.3796214Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3796855Z     return value(self)
2025-12-04T09:53:09.3797545Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3798318Z     self.check_model(model, inps)
2025-12-04T09:53:09.3798979Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3799677Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3800360Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3801206Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3801881Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3802694Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3803566Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3804358Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3805173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3805961Z     raise e
2025-12-04T09:53:09.3806654Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3807421Z     return func(
2025-12-04T09:53:09.3808149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3809080Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3809906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3810629Z     return compile_fx_aot(
2025-12-04T09:53:09.3811341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3812111Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3812823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3813552Z     return compile_fx(
2025-12-04T09:53:09.3814211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3814965Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3815797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3816631Z     return _compile_fx_main(
2025-12-04T09:53:09.3817347Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3818190Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3819054Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3819874Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3820673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3821428Z     return compile_fx_forward(
2025-12-04T09:53:09.3822173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3822945Z     return inner_compile(
2025-12-04T09:53:09.3823417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3823955Z     return func(*args, **kwds)
2025-12-04T09:53:09.3824799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3825716Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3826610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3827426Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3828241Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3829166Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3829989Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3830789Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3831612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3832610Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3833612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3834412Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3835216Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3836021Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3836758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3837536Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3838071Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3838464Z 
2025-12-04T09:53:09.3838681Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3839690Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3840481Z 
2025-12-04T09:53:09.3840758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3841397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3841854Z unimplemented []
2025-12-04T09:53:09.3842241Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3842653Z inductor []
2025-12-04T09:53:09.3842888Z graph_break []
2025-12-04T09:53:09.3843267Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3844463Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3845534Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3846510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3847496Z   warnings.warn(
2025-12-04T09:53:09.3847888Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3848348Z unimplemented []
2025-12-04T09:53:09.3848677Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3849072Z inductor []
2025-12-04T09:53:09.3849303Z graph_break []
2025-12-04T09:53:09.3849682Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3850872Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3851945Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3852968Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3853947Z   warnings.warn(
2025-12-04T09:53:09.3854340Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3854799Z unimplemented []
2025-12-04T09:53:09.3855128Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3855524Z inductor []
2025-12-04T09:53:09.3855774Z graph_break []
2025-12-04T09:53:09.3856138Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3857383Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3858449Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3859395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3860367Z   warnings.warn(
2025-12-04T09:53:09.3861290Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml -
2025-12-04T09:53:09.3862353Z =========================== short test summary info ============================
2025-12-04T09:53:09.3863520Z FAILED [0.1403s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3864507Z 
2025-12-04T09:53:09.3864727Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3865737Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3866528Z 
2025-12-04T09:53:09.3866815Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3867407Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:53:09.3867937Z ================== 1 failed, 151 deselected, 2 rerun in 1.39s ==================
2025-12-04T09:53:09.3868391Z Got exit code 1
2025-12-04T09:53:09.3868665Z Retrying single test...
2025-12-04T09:53:09.3869290Z W1204 09:49:52.908000 6979 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.3870451Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml
2025-12-04T09:53:09.3871338Z ============================= test session starts ==============================
2025-12-04T09:53:09.3871989Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:53:09.3872599Z cachedir: .pytest_cache
2025-12-04T09:53:09.3873321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:53:09.3874111Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:53:09.3874454Z configfile: pytest.ini
2025-12-04T09:53:09.3875183Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:53:09.3876097Z collecting ... collected 934 items / 151 deselected / 783 selected
2025-12-04T09:53:09.3877208Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3878180Z Running 1 items in this shard
2025-12-04T09:53:09.3878407Z 
2025-12-04T09:53:09.3879249Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.9643s] [100%]
2025-12-04T09:53:09.3881141Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py ('RERUN', {'yellow': True}) [0.1832s] [100%]
2025-12-04T09:53:09.3882912Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda <- test/inductor/test_torchinductor.py FAILED [0.1363s] [100%]
2025-12-04T09:53:09.3883783Z 
2025-12-04T09:53:09.3883931Z ==================================== RERUNS ====================================
2025-12-04T09:53:09.3884573Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3885239Z Traceback (most recent call last):
2025-12-04T09:53:09.3885889Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3886530Z     return value(self)
2025-12-04T09:53:09.3887216Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3887992Z     self.check_model(model, inps)
2025-12-04T09:53:09.3888658Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3889359Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3889980Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3890655Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3891323Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3892081Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3892960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3893744Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3894554Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3895338Z     raise e
2025-12-04T09:53:09.3896025Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3896796Z     return func(
2025-12-04T09:53:09.3897509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3898435Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3899256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3899979Z     return compile_fx_aot(
2025-12-04T09:53:09.3900682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3901605Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3902316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3903044Z     return compile_fx(
2025-12-04T09:53:09.3903705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3904458Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3905298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3906134Z     return _compile_fx_main(
2025-12-04T09:53:09.3906857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3907710Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3908572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3909389Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3910299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3911052Z     return compile_fx_forward(
2025-12-04T09:53:09.3911793Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3912569Z     return inner_compile(
2025-12-04T09:53:09.3913037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3913579Z     return func(*args, **kwds)
2025-12-04T09:53:09.3914381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3915294Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3916205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3917039Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3917862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3918712Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3919540Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3920346Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3921174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3922234Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3923244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3924049Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3924865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3925672Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3926412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3927195Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3927723Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3928114Z 
2025-12-04T09:53:09.3928331Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3929349Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3930133Z 
2025-12-04T09:53:09.3930414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3931054Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3931515Z unimplemented []
2025-12-04T09:53:09.3931850Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3932246Z inductor []
2025-12-04T09:53:09.3932474Z graph_break []
2025-12-04T09:53:09.3932845Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3934031Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.3935097Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.3936059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.3937034Z   warnings.warn(
2025-12-04T09:53:09.3937530Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.3938205Z Traceback (most recent call last):
2025-12-04T09:53:09.3938856Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.3939508Z     return value(self)
2025-12-04T09:53:09.3940181Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.3940954Z     self.check_model(model, inps)
2025-12-04T09:53:09.3941624Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.3942387Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.3942992Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.3943673Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.3944356Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.3945098Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.3945968Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.3946770Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.3947577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3948343Z     raise e
2025-12-04T09:53:09.3949024Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.3949810Z     return func(
2025-12-04T09:53:09.3950525Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.3951448Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.3952291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.3953012Z     return compile_fx_aot(
2025-12-04T09:53:09.3953704Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.3954475Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.3955199Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.3955922Z     return compile_fx(
2025-12-04T09:53:09.3956568Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.3957322Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.3958162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.3958981Z     return _compile_fx_main(
2025-12-04T09:53:09.3959844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.3960698Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.3961559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.3962430Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3963227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.3964004Z     return compile_fx_forward(
2025-12-04T09:53:09.3964755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.3965514Z     return inner_compile(
2025-12-04T09:53:09.3966002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.3966544Z     return func(*args, **kwds)
2025-12-04T09:53:09.3967340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.3968257Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.3969167Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.3969982Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.3970789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.3971699Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.3972534Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.3973320Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.3974142Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.3975144Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.3976138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.3976915Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.3977720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.3978545Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.3979280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.3980035Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.3980563Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.3980955Z 
2025-12-04T09:53:09.3981188Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.3982190Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.3982978Z 
2025-12-04T09:53:09.3983246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.3996507Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.3997127Z unimplemented []
2025-12-04T09:53:09.3997484Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.3997887Z inductor []
2025-12-04T09:53:09.3998138Z graph_break []
2025-12-04T09:53:09.3998509Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.3999713Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.4000805Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.4001952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.4002991Z   warnings.warn(
2025-12-04T09:53:09.4003388Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.4003858Z unimplemented []
2025-12-04T09:53:09.4004172Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.4004582Z inductor []
2025-12-04T09:53:09.4004823Z graph_break []
2025-12-04T09:53:09.4005185Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.4006364Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.4007432Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.4008609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.4009580Z   warnings.warn(
2025-12-04T09:53:09.4009893Z =================================== FAILURES ===================================
2025-12-04T09:53:09.4010509Z _____ AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda ______
2025-12-04T09:53:09.4011095Z Traceback (most recent call last):
2025-12-04T09:53:09.4011835Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 14842, in new_test
2025-12-04T09:53:09.4012478Z     return value(self)
2025-12-04T09:53:09.4013172Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor.py", line 939, in test_empty_cat_dtype_promotion
2025-12-04T09:53:09.4013934Z     self.check_model(model, inps)
2025-12-04T09:53:09.4014598Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 252, in check_model
2025-12-04T09:53:09.4015291Z     actual = AOTIRunnerUtil.run(
2025-12-04T09:53:09.4015896Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 184, in run
2025-12-04T09:53:09.4016559Z     package_path = AOTIRunnerUtil.compile(
2025-12-04T09:53:09.4017228Z   File "/var/lib/jenkins/workspace/test/inductor/test_aot_inductor_utils.py", line 172, in compile
2025-12-04T09:53:09.4017974Z     package_path = torch._inductor.aoti_compile_and_package(
2025-12-04T09:53:09.4018817Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 151, in aoti_compile_and_package
2025-12-04T09:53:09.4019608Z     return aot_inductor_minifier_wrapper(
2025-12-04T09:53:09.4020405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1336, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.4021160Z     raise e
2025-12-04T09:53:09.4021841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/debug.py", line 1306, in aot_inductor_minifier_wrapper
2025-12-04T09:53:09.4022610Z     return func(
2025-12-04T09:53:09.4023325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 195, in _aoti_compile_and_package_inner
2025-12-04T09:53:09.4024241Z     aoti_files = aot_compile(gm, args, kwargs, options=inductor_configs)
2025-12-04T09:53:09.4025079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/__init__.py", line 311, in aot_compile
2025-12-04T09:53:09.4025793Z     return compile_fx_aot(
2025-12-04T09:53:09.4026495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2007, in compile_fx_aot
2025-12-04T09:53:09.4027243Z     compiled_artifacts = compile_fx(
2025-12-04T09:53:09.4027962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2477, in compile_fx
2025-12-04T09:53:09.4028686Z     return compile_fx(
2025-12-04T09:53:09.4029338Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2516, in compile_fx
2025-12-04T09:53:09.4030083Z     return _maybe_wrap_and_compile_fx_main(
2025-12-04T09:53:09.4030928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2605, in _maybe_wrap_and_compile_fx_main
2025-12-04T09:53:09.4031758Z     return _compile_fx_main(
2025-12-04T09:53:09.4032465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2788, in _compile_fx_main
2025-12-04T09:53:09.4033343Z     return inference_compiler(unlifted_gm, example_inputs_)
2025-12-04T09:53:09.4034217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/schemas.py", line 1249, in __call__
2025-12-04T09:53:09.4035025Z     return self.compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.4035820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2669, in fw_compiler_base
2025-12-04T09:53:09.4036659Z     return compile_fx_forward(
2025-12-04T09:53:09.4037406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2341, in compile_fx_forward
2025-12-04T09:53:09.4038173Z     return inner_compile(
2025-12-04T09:53:09.4038657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/contextlib.py", line 79, in inner
2025-12-04T09:53:09.4039199Z     return func(*args, **kwds)
2025-12-04T09:53:09.4039906Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 806, in compile_fx_inner
2025-12-04T09:53:09.4040881Z     return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
2025-12-04T09:53:09.4041790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
2025-12-04T09:53:09.4042683Z     inner_compiled_fn = compiler_fn(gm, example_inputs)
2025-12-04T09:53:09.4043498Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 992, in _compile_fx_inner
2025-12-04T09:53:09.4044339Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T09:53:09.4045175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 988, in _compile_fx_inner
2025-12-04T09:53:09.4045981Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T09:53:09.4046790Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T09:53:09.4047799Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T09:53:09.4048799Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1471, in codegen_and_compile
2025-12-04T09:53:09.4049599Z     _check_triton_bf16_support(graph)
2025-12-04T09:53:09.4050398Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2911, in _check_triton_bf16_support
2025-12-04T09:53:09.4051215Z     warn_and_skip(node.get_device())
2025-12-04T09:53:09.4051946Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2894, in warn_and_skip
2025-12-04T09:53:09.4052702Z     raise SkipFrame("BF16 is not supported")
2025-12-04T09:53:09.4053223Z torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.4053623Z 
2025-12-04T09:53:09.4053840Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.4054858Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.4055646Z 
2025-12-04T09:53:09.4055914Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.4056550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.4057020Z unimplemented []
2025-12-04T09:53:09.4057334Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.4057728Z inductor []
2025-12-04T09:53:09.4057971Z graph_break []
2025-12-04T09:53:09.4058349Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.4059527Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.4060603Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.4061568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.4062542Z   warnings.warn(
2025-12-04T09:53:09.4062918Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.4063392Z unimplemented []
2025-12-04T09:53:09.4063716Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.4064180Z inductor []
2025-12-04T09:53:09.4064424Z graph_break []
2025-12-04T09:53:09.4064799Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.4065963Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.4067031Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.4067990Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.4069028Z   warnings.warn(
2025-12-04T09:53:09.4069400Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T09:53:09.4069866Z unimplemented []
2025-12-04T09:53:09.4070194Z stats [('calls_captured', 3), ('unique_graphs', 1)]
2025-12-04T09:53:09.4070572Z inductor []
2025-12-04T09:53:09.4070812Z graph_break []
2025-12-04T09:53:09.4071192Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T09:53:09.4072372Z /opt/conda/envs/py_3.10/lib/python3.10/copyreg.py:101: FutureWarning: `isinstance(treespec, LeafSpec)` is deprecated, use `isinstance(treespec, TreeSpec) and treespec.is_leaf()` instead.
2025-12-04T09:53:09.4073429Z   return cls.__new__(cls, *args)
2025-12-04T09:53:09.4074379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T09:53:09.4075349Z   warnings.warn(
2025-12-04T09:53:09.4076259Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml -
2025-12-04T09:53:09.4077326Z =========================== short test summary info ============================
2025-12-04T09:53:09.4078511Z FAILED [0.1363s] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda - torch._inductor.exc.InductorError: SkipFrame: BF16 is not supported
2025-12-04T09:53:09.4079487Z 
2025-12-04T09:53:09.4079720Z To execute this test, run the following from the base repo dir:
2025-12-04T09:53:09.4080716Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_aot_inductor.py AOTInductorTestABICompatibleGpu.test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.4081515Z 
2025-12-04T09:53:09.4081781Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T09:53:09.4082451Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T09:53:09.4082988Z ================== 1 failed, 151 deselected, 2 rerun in 1.37s ==================
2025-12-04T09:53:09.4083429Z Got exit code 1
2025-12-04T09:53:09.4084173Z FAILED CONSISTENTLY: test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda
2025-12-04T09:53:09.4085311Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T09:53:09.4086308Z W1204 09:50:12.514000 7148 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.4087444Z Test results will be stored in test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml
2025-12-04T09:53:09.4088318Z ============================= test session starts ==============================
2025-12-04T09:53:09.4088985Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T09:53:09.4089593Z cachedir: .pytest_cache
2025-12-04T09:53:09.4090291Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T09:53:09.4091079Z rootdir: /var/lib/jenkins/workspace
2025-12-04T09:53:09.4091434Z configfile: pytest.ini
2025-12-04T09:53:09.4092230Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T09:53:09.4093142Z collecting ... collected 934 items / 64 deselected / 870 selected
2025-12-04T09:53:09.4093654Z stepcurrent: skipping 64 already run items.
2025-12-04T09:53:09.4094050Z Running 88 items in this shard
2025-12-04T09:53:09.4094264Z 
2025-12-04T09:53:09.4094936Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_grid_cuda <- test/inductor/test_torchinductor.py PASSED [9.3493s] [  1%]
2025-12-04T09:53:09.4097014Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_mmaped_weights_on_disk_cuda <- test/inductor/test_torchinductor.py W1204 09:50:25.748000 7148 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T09:53:09.4098461Z PASSED [15.5315s] [  2%]
2025-12-04T09:53:09.4099562Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_large_weight_cuda SKIPPED [0.0004s] (install_free_tensors leads to OOM - https://github.com/pytorch/pytorch/issues/164062) [  3%]
2025-12-04T09:53:09.4101439Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_nan_cuda SKIPPED [0.0002s] (Skip this test, only for local test. SIGABRT is produced.) [  4%]
2025-12-04T09:53:09.4102926Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_no_args_cuda <- test/inductor/test_torchinductor.py PASSED [5.9330s] [  5%]
2025-12-04T09:53:09.4104408Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_proxy_executor_hann_cuda <- test/inductor/test_torchinductor.py PASSED [5.4572s] [  6%]
2025-12-04T09:53:09.4105748Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_quantized_linear_cuda XFAIL [0.0330s] [  7%]
2025-12-04T09:53:09.4107080Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_repeated_user_defined_triton_kernel_embed_kernel_binary_False_cuda PASSED [11.5715s] [  9%]
2025-12-04T09:53:09.4108628Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_return_constant_cuda <- test/inductor/test_torchinductor.py PASSED [5.1252s] [ 10%]
2025-12-04T09:53:09.4110159Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_reuse_kernel_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [7.8888s] [ 11%]
2025-12-04T09:53:09.4111670Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_same_backing_cuda <- test/inductor/test_torchinductor.py PASSED [6.1924s] [ 12%]
2025-12-04T09:53:09.4113195Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_scaled_grouped_mm_cuda SKIPPED [0.0003s] (scaled_grouped_mm is only supported on SM90) [ 13%]
2025-12-04T09:53:09.4114684Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sdpa_cuda SKIPPED [0.0002s] (bfloat16 only supported in sm80+ or XPU) [ 14%]
2025-12-04T09:53:09.4116142Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_dynamic_cuda <- test/inductor/test_torchinductor.py PASSED [6.0785s] [ 15%]
2025-12-04T09:53:09.4117761Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_simple_multi_arch_embed_kernel_binary_False_cuda SKIPPED [0.0003s] (Test is only supported on CUDA 12.8+) [ 17%]
2025-12-04T09:53:09.4120049Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_size_from_multi_output_cuda <- test/inductor/test_torchinductor.py W1204 09:51:27.909000 7148 site-packages/torch/_inductor/ir.py:8050] [0/0] aten._unique2.default is missing a c-shim implementation, using proxy executor as fallback
2025-12-04T09:53:09.4121645Z PASSED [6.1285s] [ 18%]
2025-12-04T09:53:09.4122557Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_small_constant_cuda <- test/inductor/test_torchinductor.py PASSED [5.2771s] [ 19%]
2025-12-04T09:53:09.4124012Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_stft_cuda <- test/inductor/test_torchinductor.py PASSED [6.5355s] [ 20%]
2025-12-04T09:53:09.4125489Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax0_cuda PASSED [6.3012s] [ 21%]
2025-12-04T09:53:09.4126782Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_sympy_cpp_printer_min_max_minmax1_cuda PASSED [5.9789s] [ 22%]
2025-12-04T09:53:09.4128202Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_1_num_dims_2_dynamic_False_autotune_True_cuda PASSED [9.1404s] [ 23%]
2025-12-04T09:53:09.4129772Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_grid_type_2_num_dims_1_dynamic_False_autotune_False_cuda PASSED [5.8727s] [ 25%]
2025-12-04T09:53:09.4131685Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_on_device_tma_dynamic_True_tma_version_old_cuda SKIPPED [0.0031s] (requires triton.tools.experimental_descriptor TMA support) [ 26%]
2025-12-04T09:53:09.4133761Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_1d_dynamic_True_tma_version_new_cuda SKIPPED [0.0029s] (requires triton.tools.tensor_descriptor TMA support) [ 27%]
2025-12-04T09:53:09.4135816Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_False_tma_version_new_cuda SKIPPED [0.0030s] (requires triton.tools.tensor_descriptor TMA support) [ 28%]
2025-12-04T09:53:09.4137903Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_tma_descriptor_2d_dynamic_True_tma_version_old_cuda SKIPPED [0.0027s] (requires triton.tools.experimental_descriptor TMA support) [ 29%]
2025-12-04T09:53:09.4139787Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_triton_kernel_unbacked_symint_in_grid_dynamic_True_autotuning_False_cuda PASSED [5.9304s] [ 30%]
2025-12-04T09:53:09.4141736Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_cuda W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:71] +============================+
2025-12-04T09:53:09.4143301Z W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:72] |     !!!   WARNING   !!!    |
2025-12-04T09:53:09.4144148Z W1204 09:52:24.604000 7148 site-packages/torch/_export/__init__.py:73] +============================+
2025-12-04T09:53:09.4145861Z W1204 09:52:24.605000 7148 site-packages/torch/_export/__init__.py:74] torch._export.aot_compile()/torch._export.aot_load() is being deprecated, please switch to directly calling torch._inductor.aoti_compile_and_package(torch.export.export())/torch._inductor.aoti_load_package() instead.
2025-12-04T09:53:09.4147355Z Error: Expected u1 >= 1 but received 0
2025-12-04T09:53:09.4147715Z PASSED [11.3883s] [ 31%]
2025-12-04T09:53:09.4148608Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_unbacked_expr_replacements_shift_k_2_use_static_size_True_cuda PASSED [8.0390s] [ 32%]
2025-12-04T09:53:09.4150658Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_conv_dynamic_True_cuda W1204 09:52:39.830000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.4152111Z PASSED [8.1713s] [ 34%]
2025-12-04T09:53:09.4153287Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_mixed_device_dynamic_False_cuda W1204 09:52:46.937000 7148 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T09:53:09.4154517Z PASSED [6.4180s] [ 35%]
2025-12-04T09:53:09.4155361Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_False_cuda PASSED [6.8472s] [ 36%]
2025-12-04T09:53:09.4157549Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_while_loop_with_unbacked_symint_closure_dynamic_True_cuda W1204 09:52:59.866000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.4159655Z W1204 09:52:59.866000 7148 site-packages/torch/export/dynamic_shapes.py:923] Using None as a dynamic shape dimension is deprecated. Please use Dim.STATIC instead
2025-12-04T09:53:09.4160559Z PASSED [7.0304s] [ 37%]
2025-12-04T09:53:09.4161504Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_m_32_n_64_q_group_32_num_groups_2_mps SKIPPED [0.0004s] (No MPS backend available) [ 38%]
2025-12-04T09:53:09.4163333Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test__weight_int4pack_mm_with_scales_and_zeros_m_32_n_64_q_group_64_num_groups_1_mps SKIPPED [0.0002s] (No MPS backend available) [ 39%]
2025-12-04T09:53:09.4165262Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_add_complex_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 40%]
2025-12-04T09:53:09.4167043Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_aliased_buffer_reuse_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 42%]
2025-12-04T09:53:09.4168906Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_autotune_with_constant_folding_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0005s] (No MPS backend available) [ 43%]
2025-12-04T09:53:09.4170582Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_bmm_multiple_dynamic_mps SKIPPED [0.0003s] (No MPS backend available) [ 44%]
2025-12-04T09:53:09.4172177Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_buffer_mutation_2_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 45%]
2025-12-04T09:53:09.4173977Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_composed_dynamic_size_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0004s] (No MPS backend available) [ 46%]
2025-12-04T09:53:09.4175707Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_mismatched_branch_output_dynamic_True_mps SKIPPED [0.0003s] (No MPS backend available) [ 47%]
2025-12-04T09:53:09.4177497Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_cond_symint_input_disable_one_pass_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 48%]
2025-12-04T09:53:09.4179145Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_convolution_mps SKIPPED [0.0002s] (No MPS backend available) [ 50%]
2025-12-04T09:53:09.4180718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_device_moved_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 51%]
2025-12-04T09:53:09.4182511Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_dynamic_scalar_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 52%]
2025-12-04T09:53:09.4184265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_embedding_bag_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 53%]
2025-12-04T09:53:09.4185989Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_empty_graph_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 54%]
2025-12-04T09:53:09.4187793Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fake_tensor_device_validation_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 55%]
2025-12-04T09:53:09.4189524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fallback_kernel_with_symexpr_output_mps SKIPPED [0.0002s] (No MPS backend available) [ 56%]
2025-12-04T09:53:09.4191195Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_fill__fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 57%]
2025-12-04T09:53:09.4193044Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_free_inactive_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 59%]
2025-12-04T09:53:09.4194577Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_inf_mps SKIPPED [0.0002s] (No MPS backend available) [ 60%]
2025-12-04T09:53:09.4195875Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_issue_140766_mps SKIPPED [0.0002s] (No MPS backend available) [ 61%]
2025-12-04T09:53:09.4197211Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_large_weight_mps SKIPPED [0.0002s] (No MPS backend available) [ 62%]
2025-12-04T09:53:09.4198564Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_nan_mps SKIPPED [0.0002s] (No MPS backend available) [ 63%]
2025-12-04T09:53:09.4200077Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_narrow_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 64%]
2025-12-04T09:53:09.4201991Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_output_path_1_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 65%]
2025-12-04T09:53:09.4203636Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_quantized_linear_bias_none_mps SKIPPED [0.0002s] (No MPS backend available) [ 67%]
2025-12-04T09:53:09.4205286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_return_view_constant_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 68%]
2025-12-04T09:53:09.4207096Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_run_with_grad_enabled_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 69%]
2025-12-04T09:53:09.4208755Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_dtype_failed_mps SKIPPED [0.0002s] (No MPS backend available) [ 70%]
2025-12-04T09:53:09.4210293Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_runtime_checks_large_mps SKIPPED [0.0002s] (No MPS backend available) [ 71%]
2025-12-04T09:53:09.4212040Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_same_backing_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 72%]
2025-12-04T09:53:09.4213833Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_scatter_reduce_fallback_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 73%]
2025-12-04T09:53:09.4215593Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_seq_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 75%]
2025-12-04T09:53:09.4217286Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_simple_dynamic_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 76%]
2025-12-04T09:53:09.4218963Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_size_with_unbacked_add_expr_transitive_mps SKIPPED [0.0002s] (No MPS backend available) [ 77%]
2025-12-04T09:53:09.4220661Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_expr_indexing_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 78%]
2025-12-04T09:53:09.4222265Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_sym_i64_input_codegen_mps SKIPPED [0.0002s] (No MPS backend available) [ 79%]
2025-12-04T09:53:09.4223894Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_2_num_dims_2_dynamic_False_autotune_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 80%]
2025-12-04T09:53:09.4225718Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_1_dynamic_False_autotune_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 81%]
2025-12-04T09:53:09.4227671Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_grid_type_3_num_dims_2_dynamic_False_autotune_True_mps SKIPPED [0.0004s] (No MPS backend available) [ 82%]
2025-12-04T09:53:09.4229524Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_multi_output_arg_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 84%]
2025-12-04T09:53:09.4231347Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_on_device_tma_dynamic_False_tma_version_old_mps SKIPPED [0.0002s] (No MPS backend available) [ 85%]
2025-12-04T09:53:09.4233321Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_triton_kernel_reinterpret_view_mem_leak_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 86%]
2025-12-04T09:53:09.4235237Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_equals_input_size_runtime_assertion_mark_unbacked_True_mps SKIPPED [0.0002s] (No MPS backend available) [ 87%]
2025-12-04T09:53:09.4237039Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_0_use_static_size_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 88%]
2025-12-04T09:53:09.4238819Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_unbacked_expr_replacements_shift_k_1_use_static_size_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 89%]
2025-12-04T09:53:09.4240672Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_inactive_constant_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 90%]
2025-12-04T09:53:09.4242641Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_update_user_managed_buffer_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 92%]
2025-12-04T09:53:09.4244462Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_nested_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 93%]
2025-12-04T09:53:09.4246214Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_simple_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 94%]
2025-12-04T09:53:09.4247870Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_conv_dynamic_False_mps SKIPPED [0.0002s] (No MPS backend available) [ 95%]
2025-12-04T09:53:09.4249592Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_while_loop_with_pytree_inputs_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 96%]
2025-12-04T09:53:09.4251403Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_with_profiler_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 97%]
2025-12-04T09:53:09.4253220Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_backed_symbols_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [ 98%]
2025-12-04T09:53:09.4255097Z inductor/test_aot_inductor.py::AOTInductorTestABICompatibleMps::test_zero_grid_with_unbacked_symbols_mps <- test/inductor/test_torchinductor.py SKIPPED [0.0002s] (No MPS backend available) [100%]
2025-12-04T09:53:09.4256137Z 
2025-12-04T09:53:09.4256903Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml -
2025-12-04T09:53:09.4258055Z ===== 23 passed, 64 skipped, 64 deselected, 1 xfailed in 172.47s (0:02:52) =====
2025-12-04T09:53:09.4259143Z The following tests failed consistently: ['test/inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_empty_cat_dtype_promotion_cuda']
2025-12-04T09:53:09.4259957Z 
2025-12-04T09:53:09.4260526Z FINISHED PRINTING LOG FILE of inductor/test_aot_inductor 4/6 (test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log)
2025-12-04T09:53:09.4261292Z 
2025-12-04T09:53:09.4261647Z Finished inductor/test_aot_inductor 4/6 ... [2025-12-04 09:53:09.315070][2346.924975125], took 8.32min
2025-12-04T09:53:09.4262944Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml
2025-12-04T09:53:09.6325625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml
2025-12-04T09:53:09.6598249Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml
2025-12-04T09:53:09.7067049Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml
2025-12-04T09:53:09.9007293Z Uploading logs for 57119749427 to S3
2025-12-04T09:53:09.9281172Z Uploading artifacts took 0.18 seconds
2025-12-04T09:53:09.9281648Z inductor/test_aot_inductor 4/6 failed!
2025-12-04T09:53:09.9286990Z Running inductor/test_torchinductor_dynamic_shapes 1/5 ... [2025-12-04 09:53:09.928522][2347.538431633]
2025-12-04T09:53:09.9287724Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T09:53:09.9292026Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=1', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 09:53:09.928958]
2025-12-04T10:02:00.0315451Z 
2025-12-04T10:02:00.0319392Z inductor/test_torchinductor_dynamic_shapes 1/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.5_8dad9aa6fdc82df0_.log
2025-12-04T10:02:00.0534587Z Running 350 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_pool_errors_with_long_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex9_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_const_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_addmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aliased_buffer_reuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_dtype_device_layout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_support_str_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_with_persistent_cache_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_argmin_with_nan_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_min_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_argmax_to_float_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_as_strided_on_views_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_batch_norm_2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bitwise_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_use_after_remove_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_extern_kernel_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_negative_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_chunk_recompiles_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_clamp_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_compar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_concat_add_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_3d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_constant_pad_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv1d_with_permute_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv3d_channels_last_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_with_scalar_src_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cummin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_scan_op_compiled_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dont_constant_fold_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float16_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_bag_byte_unpack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fallback_mutable_op_no_mutated_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_cat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_index_expression_type_promotion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_floordiv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fractional_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_full_like_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_gather3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generated_code_has_size_stride_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_argmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_no_inputs_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_pad_dynamic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_refcount_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_horizonal_fusion2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_indirect_load_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inner_fn_str_and_stride_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_insignificant_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_isinf_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_kernel_names_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_l1_loss_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_strided_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_layer_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_like_rands_sliced_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linalg_eig_stride_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_dynamic_maxautotune_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linspace4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_dynamic_shape_assertion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log1p_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_2_dim_3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d6_dilation_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mix_device_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mm_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_move_arange_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_mutable_custom_op_fixed_layout2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_ones_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_one_hot_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_output_strides_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_philox_rand_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pixel_shuffle_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_j0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_chebyshev_polynomial_v_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_expit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_hermite_polynomial_h_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_modified_bessel_i0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_xlogy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reflection_pad2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_remove_noop_view_default_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_roll_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_round_correctness_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_cpu_tensor_arg_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_reduce3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_unaligned_mask_freezing_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_setitem_with_int_parameter_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sin_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sizehint_issue1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_view_with_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_backward_data_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sort_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumsum_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stack_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_stride_preservation_with_stride_modifying_fx_pass_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_topk_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unroll_small_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsigned_constant_tensors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_upsample_cat_conv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_div_by_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_vdd_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_as_real_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_view_detach_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_bwd_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__dyn_quant_pack_4bit_weight_fp32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test__unsafe_masked_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_addmm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_support_out_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_with_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d7_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_batch_norm_2d_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bernoulli1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bfloat16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bmm2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_buffer_use_after_remove_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_pos_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_float_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_builtins_round_int_ndigits_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_negative_dim_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_unbacked_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clamp_type_promotion_non_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_fallback_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_from_real_imag_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_complex_memory_overlap_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_computed_buffer_inlining_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_2d_strides_nonpositive_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv2d_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_bn_fuse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_convolution5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_with_scalar_src_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_cpu_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_tensor_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cudnn_rnn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_default_layout_constraint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dont_constant_fold_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtype_sympy_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float32_float32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_embedding_sparse_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_sliced_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_truncation_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_getitem_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_constant_tensor2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_refcount_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_scalar_inputs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_select_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inductor_assert_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inf_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_resize_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_input_mutation4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_int8_weight_only_quant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_invalid_operand_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_isin_tensor_scalar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_broadcast_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_pointwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_strided_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_mode_not_decompose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_triton_kernel_wrapper_functional_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log_fp64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logcumsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_long_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_1_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_low_memory_max_pool_dilation_2_dim_3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_min_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d6_dilation_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multilayer_prime_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_narrow_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pad_cast_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pattern_matcher_multi_user_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_philox_rand_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_j1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_gammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_modified_bessel_k1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_multigammaln_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_w_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_zeta_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prod_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rand_like_deterministic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction_config_limit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_clone_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remove_noop_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_replication_pad_errors_with_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_require_stride_expanded_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter_reduce1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_select_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_signbit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_silu_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_mutation1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_softmax_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_failed_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_dynamic_shape_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_integer_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_stack_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_tensor3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_device_constant_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_to_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_topk_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_transpose_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_triton_kernel_bool_param_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_uint4x2_mixed_mm_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unbind_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unfold_zero_dimension_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unspec_inputs_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unsqueeze_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_bilinear2d_b_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest3d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_where_with_logical_op_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_xblock_divides_xnumel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arange_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_arithmetic_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_cat_unbacked_duplicate_size_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_full_symbolic_value_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_unbacked_stride_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op0_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op3_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_no_realloc_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_fallback_specialization_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unwrap_storage_didnt_work_repro_cuda
2025-12-04T10:02:00.0746570Z 
2025-12-04T10:02:00.0747062Z Finished inductor/test_torchinductor_dynamic_shapes 1/5 ... [2025-12-04 10:02:00.032256][2877.642162343], took 8.84min
2025-12-04T10:02:00.0748660Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.xml
2025-12-04T10:02:00.1338993Z Running inductor/test_torchinductor_dynamic_shapes 5/5 ... [2025-12-04 10:02:00.133578][2877.743485122]
2025-12-04T10:02:00.1339676Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:02:00.1342655Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_dynamic_shapes.py', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:02:00.134016]
2025-12-04T10:11:03.9776599Z 
2025-12-04T10:11:03.9777861Z inductor/test_torchinductor_dynamic_shapes 5/5 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.5_0c7fd80a5a340f9b_.log
2025-12-04T10:11:04.0001355Z Running 370 items in this shard: test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__dyn_quant_matmul_4bit_bf16_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test__unsafe_masked_index_put_accumulate_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_adaptive_avg_pool_with_output_size_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_add_complex8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_alexnet_prefix_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_angle_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_aoti_eager_with_scalar_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_arange1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool3d_backward2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_avg_pool_errors_with_uint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_baddbmm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bmm2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_int64_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_bucketize_int_uint8_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_batch_norm_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_buffer_copied_in_graph_with_different_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_builtins_round_int_ndigits_pos_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cat_unbacked_legacy_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_complex_from_real_imag_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv2d_backward_channels_last_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_conv_bn_fuse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_convolution5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_copy_non_blocking_is_pinned_use_cat_False_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cpu_scalar_with_gpu_tensor_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_default_layout_constraint_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_custom_op_unbacked_symints_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dense_mask_index_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_deterministic_codegen_on_graph_break_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_diagonal_copy_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dist_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_div_by_zero_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dropout_trivial_0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float32_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_float64_int64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int16_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int32_int8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_int8_int32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_dtypeview_uint8_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_embedding_sparse_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_emulate_precision_triton_fp_fusion_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_erfinv_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_flexible_layout_immutable_free_symbols_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_float_repr_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_fmod_zero_dim_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_forced_buffer_realize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_generate_rand_fp8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_constant_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_graph_partition_mutation_real_name_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardsigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_hardswish_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_propagation_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_index_put_failed_reinplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_assert_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_multiple_specializations_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inductor_triton_bucketize_respects_masking_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_inplace_flip_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_input_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_issue102546_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_large_broadcast_reduction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lgamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_linear_mixed_dtype_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_mode_not_decompose_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_flex_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_lite_regional_compile_repeated_blocks_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_fp64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_log_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logaddexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_logcumsumexp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_low_memory_max_pool_dilation_1_dim_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_min_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d5_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d7_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_max_pool2d_with_indices_backward4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_multilayer_any_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_narrow_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_neg_max_uint8_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_new_empty_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_nll_loss_backward_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pad_view_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pattern_matcher_multi_user_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_permute2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_airy_ai_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_bessel_y0_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_digamma_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_erfc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_exp2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_gammaincc_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_i1e_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_logit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_multigammaln_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_psi_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pointwise_shifted_chebyshev_polynomial_u_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_by_natural_log2_dynamic_shapes_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_int_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_pow_symfloat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_prod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_like_empty_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_randn_with_dtype_and_device_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_reduction_config_limit_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_as_strided_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int32_nd_1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_Tensor_decomp_int64_nd_2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_resize_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_rsqrt_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_input_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scalar_output_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter6_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scatter_add3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_scheduler_vertical_fusion1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sdpa_prefer_nd_tiling_False_use_block_ptr_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_searchsorted_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_sigmoid_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_mutation3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_slice_scatter4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_softmax_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_cumprod_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_split_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tan_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor1_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tensor_index_put_slice_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue2_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_tmp_not_defined_issue3_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_to_memory_format_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unbacked_floordiv_simplify_errors_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float32_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_float64_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unspec_inputs_int16_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_unsqueeze_inplace_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_correction_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_var_mean_tile_reduction_True_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_views4_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_weight_norm_conv2d_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_broadcast_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_where_with_logical_op_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesCpuTests::test_zero_element_mutation_dynamic_shapes_cpu, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adaptive_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_complex8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_add_const_float_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_adding_tensor_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_addmv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_aoti_eager_cache_hit_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_arange6_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_argmax_argmin3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_as_strided_on_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_assert_size_stride_op_name_fail_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool2d_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_avg_pool3d_backward4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bitwise_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bool_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_both_scalars_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_add_autotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_computed_offsets_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_int8_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_int_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_bucketize_nd_tiling_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_of_loops_and_extern_kernel_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cat_single_empty_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_concat_add_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_config_option_dont_assume_alignment_recompiles_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_consecutive_split_cumsum_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_constant_pad_nd_inplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv3d_channels_last_use_block_ptr_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_shape_check_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_conv_with_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_copy_non_blocking_is_pinned_use_cat_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_scalar_with_gpu_tensor_cpp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cpu_tensor_with_gpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_cumsum_pattern_matcher_issue_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_custom_op_fixed_layout_sequential_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dist_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div9_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_by_zero_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_presicion_accuracy_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_div_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dropout_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float16_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_float64_int64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int16_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int32_int32_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int64_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_int8_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_float64_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_dtypeview_uint8_int8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_empty_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_exp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expand_as_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expanded_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_expm1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_basic_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fallback_mutable_op_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fft_real_input_real_output_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_flip_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float16_to_int16_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_index_expression_type_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_float_repr_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fractional_max_pool2d1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_full_like_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_fusing_write_into_disjoint_read_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gather3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_gpu_scalar_with_cpu_tensor_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_arange2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_graph_partition_misaligned_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_grid_sampler_2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_hardsigmoid_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_horizonal_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_abs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_propagation_floordiv_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_fallback2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_index_put_reinplace_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inner_reduction_detection_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_inplace_add_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_insignificant_strides_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_kwargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_large_block_sizes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_like_rands_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_linear_dynamic_maxautotune_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_lite_regional_compile_invoke_subgraph_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_logsumexp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mark_unbacked_with_hint_override_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_masked_fill_promotion_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_max_pool2d_with_indices_backward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mean_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_misaligned_address_issue1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mix_device_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mixed_mm3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_mixed_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mm_views_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mul_index_expr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_mul_softmax_symfloat_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_gpu_device_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_multi_threading_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nan_sort_stable_False_descending_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_neg_max_uint8_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_nll_loss_forward_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pixel_shuffle_channels_last_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_bessel_y1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_chebyshev_polynomial_u_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_entr_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_exp2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_hermite_polynomial_h_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_i1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_laguerre_polynomial_l_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_log1p_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_polygamma_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_psi_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_scaled_modified_bessel_k0_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_shifted_chebyshev_polynomial_v_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pointwise_xlog1py_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_polar_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_pow_int_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_prepare_softmax_with_fast_math_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_profiler_mark_wrapper_call_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_randint_distribution_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reduction5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_reflection_pad2d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_remainder_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_as_strided_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_Tensor_decomp_int32_nd_2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_repeat_interleave_decomposition_has_clamp_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_roll_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_rsqrt_dynamic_shapes_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scalar_input_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scaled_dot_product_efficient_attention_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_scatter5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sdpa_prefer_nd_tiling_True_use_block_ptr_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_setitem_with_int_parameter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sgn_extremal_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice2_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dtype_consistency_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_slice_scatter_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sort_stable_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_cumsum_index_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_reduction_with_int64_size_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_list_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_split_with_sizes_with_unbacked_symints_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_squeeze_varargs_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_std_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum3_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_dtype_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_sum_keepdims_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_unroll_small_reduction_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_upsample_nearest1d_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_False_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_var_mean_tile_reduction_True_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_vertical_fusion1_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views4_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_views5_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_weight_norm_bwd_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::DynamicShapesGPUTests::test_zeros_dynamic_shapes_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_adaptive_max_pool3d_with_indices_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_is_integer_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_float_item_inf_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_item_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op4_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op5_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_math_ops_op7_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_nonzero_size_factory_nobreak_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_pad_dynamic_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_slice_index_changing_sign_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_sub_constant_folding_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_cat_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unbacked_save_for_backwards_cuda, test/inductor/test_torchinductor_dynamic_shapes.py::TestInductorDynamicCUDA::test_unspecialized_float_softshrink_cuda
2025-12-04T10:11:04.0219308Z 
2025-12-04T10:11:04.0219769Z Finished inductor/test_torchinductor_dynamic_shapes 5/5 ... [2025-12-04 10:11:03.978540][3421.588446977], took 9.06min
2025-12-04T10:11:04.0221315Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.xml
2025-12-04T10:11:04.0743656Z Running inductor/test_kernel_benchmark 1/1 ... [2025-12-04 10:11:04.074090][3421.683997583]
2025-12-04T10:11:04.0744249Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:11:04.0747683Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_kernel_benchmark.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:11:04.074508]
2025-12-04T10:15:47.8002433Z 
2025-12-04T10:15:47.8003407Z PRINTING LOG FILE of inductor/test_kernel_benchmark 1/1 (test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log)
2025-12-04T10:15:47.8004881Z W1204 10:11:12.599000 34099 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8006638Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml
2025-12-04T10:15:47.8007943Z ============================= test session starts ==============================
2025-12-04T10:15:47.8008638Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:15:47.8009291Z cachedir: .pytest_cache
2025-12-04T10:15:47.8009991Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:15:47.8010758Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:15:47.8011105Z configfile: pytest.ini
2025-12-04T10:15:47.8011814Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:15:47.8012595Z collecting ... collected 18 items
2025-12-04T10:15:47.8012984Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:15:47.8021823Z Running 18 items in this shard: test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_fused_layernorm_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_triton_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation_2, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_pw_kernel_benchmark, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_reduction_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_multiple_kernels, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_scalar, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_templates, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_cat_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_split_scan, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_star_dep, test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_unused_input_bandwidth_computation
2025-12-04T10:15:47.8030911Z 
2025-12-04T10:15:47.8031428Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_fused_layernorm_bandwidth_computation PASSED [20.4646s] [  5%]
2025-12-04T10:15:47.8032998Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_bandwidth_computation W1204 10:11:34.107000 34099 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8034145Z PASSED [13.4783s] [ 11%]
2025-12-04T10:15:47.8035108Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_matmul_triton_kernel_benchmark SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 16%]
2025-12-04T10:15:47.8036488Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation PASSED [13.1103s] [ 22%]
2025-12-04T10:15:47.8037765Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_slice_add_bandwidth_computation_2 PASSED [12.9525s] [ 27%]
2025-12-04T10:15:47.8040200Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_mm_triton_kernel_benchmark SKIPPED [0.0008s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/118346 for platform(s) linux, slow. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 33%]
2025-12-04T10:15:47.8042851Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_pw_kernel_benchmark PASSED [13.4593s] [ 38%]
2025-12-04T10:15:47.8043897Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_reduction_bandwidth_computation PASSED [13.3010s] [ 44%]
2025-12-04T10:15:47.8044932Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps PASSED [21.5735s] [ 50%]
2025-12-04T10:15:47.8046015Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_multiple_kernels PASSED [24.0462s] [ 55%]
2025-12-04T10:15:47.8047116Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_scalar PASSED [21.6602s] [ 61%]
2025-12-04T10:15:47.8048480Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_remove_inductor_deps_templates SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [ 66%]
2025-12-04T10:15:47.8049851Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_bandwidth_computation PASSED [13.2732s] [ 72%]
2025-12-04T10:15:47.8050961Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_add_cat_bandwidth_computation PASSED [13.0895s] [ 77%]
2025-12-04T10:15:47.8052148Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1448s] [ 83%]
2025-12-04T10:15:47.8053408Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1044s] [ 83%]
2025-12-04T10:15:47.8054562Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1043s] [ 83%]
2025-12-04T10:15:47.8055175Z 
2025-12-04T10:15:47.8055315Z ==================================== RERUNS ====================================
2025-12-04T10:15:47.8055880Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8056421Z Traceback (most recent call last):
2025-12-04T10:15:47.8057185Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8057975Z     out = f(*inputs)
2025-12-04T10:15:47.8058624Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8059481Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8060362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8061193Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8062014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8062797Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8063585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8064573Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8065543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8066297Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8066990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8067641Z     return super().run(*args)
2025-12-04T10:15:47.8068246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8068887Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8069560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8070240Z     result = super().run_node(n)
2025-12-04T10:15:47.8070865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8071653Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8072392Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8073222Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8074037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8074834Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8075609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8076299Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8076959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8077661Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8078473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8079282Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8079987Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8080872Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8082586Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8084059Z   target: aten.mm.default
2025-12-04T10:15:47.8084356Z   args[0]: TensorBox(
2025-12-04T10:15:47.8084635Z     ReinterpretView(
2025-12-04T10:15:47.8084915Z       StorageBox(
2025-12-04T10:15:47.8085457Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8086080Z       ),
2025-12-04T10:15:47.8086499Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8087029Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8087366Z       stack_traces = {,
2025-12-04T10:15:47.8087924Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8088535Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8088866Z       ,
2025-12-04T10:15:47.8089080Z       }
2025-12-04T10:15:47.8089278Z     )
2025-12-04T10:15:47.8089489Z   )
2025-12-04T10:15:47.8089730Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8090331Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8090948Z   ))
2025-12-04T10:15:47.8091082Z 
2025-12-04T10:15:47.8091782Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8092604Z 
2025-12-04T10:15:47.8092609Z 
2025-12-04T10:15:47.8092832Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8093840Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8094546Z 
2025-12-04T10:15:47.8094806Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8095430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8095899Z frames [('total', 1)]
2025-12-04T10:15:47.8096175Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8096607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8097139Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8097502Z graph_break []
2025-12-04T10:15:47.8097873Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8098515Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8099141Z Traceback (most recent call last):
2025-12-04T10:15:47.8099908Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8100677Z     out = f(*inputs)
2025-12-04T10:15:47.8101542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8102401Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8103271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8104100Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8104931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8105700Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8106493Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8107469Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8108434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8109183Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8109788Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8110431Z     return super().run(*args)
2025-12-04T10:15:47.8111014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8111665Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8112341Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8113018Z     result = super().run_node(n)
2025-12-04T10:15:47.8113648Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8114376Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8115118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8115950Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8116756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8117555Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8118340Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8119010Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8119682Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8120381Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8121365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8122232Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8122948Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8123828Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8125476Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8127040Z   target: aten.mm.default
2025-12-04T10:15:47.8127339Z   args[0]: TensorBox(
2025-12-04T10:15:47.8127621Z     ReinterpretView(
2025-12-04T10:15:47.8127881Z       StorageBox(
2025-12-04T10:15:47.8128446Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8129066Z       ),
2025-12-04T10:15:47.8129477Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8130031Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8130375Z       stack_traces = {,
2025-12-04T10:15:47.8130932Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8142257Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8142740Z       ,
2025-12-04T10:15:47.8142951Z       }
2025-12-04T10:15:47.8143173Z     )
2025-12-04T10:15:47.8143391Z   )
2025-12-04T10:15:47.8143623Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8144253Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8144878Z   ))
2025-12-04T10:15:47.8144998Z 
2025-12-04T10:15:47.8145711Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8146552Z 
2025-12-04T10:15:47.8146557Z 
2025-12-04T10:15:47.8146769Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8147692Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8148401Z 
2025-12-04T10:15:47.8148676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8149305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8149751Z frames [('total', 1)]
2025-12-04T10:15:47.8150040Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8150475Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8150936Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8151269Z graph_break []
2025-12-04T10:15:47.8151550Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8152003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8152456Z frames [('total', 1)]
2025-12-04T10:15:47.8152742Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8153157Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8153627Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8153963Z graph_break []
2025-12-04T10:15:47.8154231Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8154624Z =================================== FAILURES ===================================
2025-12-04T10:15:47.8155191Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8155731Z Traceback (most recent call last):
2025-12-04T10:15:47.8156495Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8157389Z     out = f(*inputs)
2025-12-04T10:15:47.8158049Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8158898Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8159782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8160611Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8161497Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8162346Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8163147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8164130Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8165094Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8165844Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8166450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8167089Z     return super().run(*args)
2025-12-04T10:15:47.8167672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8168332Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8169005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8169685Z     result = super().run_node(n)
2025-12-04T10:15:47.8170308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8171040Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8171782Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8172609Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8173420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8174220Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8175002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8175676Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8176349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8177054Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8177867Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8178669Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8179375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8180252Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8181885Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8183359Z   target: aten.mm.default
2025-12-04T10:15:47.8183654Z   args[0]: TensorBox(
2025-12-04T10:15:47.8183933Z     ReinterpretView(
2025-12-04T10:15:47.8184189Z       StorageBox(
2025-12-04T10:15:47.8184824Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8185454Z       ),
2025-12-04T10:15:47.8185866Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8186420Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8186764Z       stack_traces = {,
2025-12-04T10:15:47.8187318Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8187933Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8188344Z       ,
2025-12-04T10:15:47.8188566Z       }
2025-12-04T10:15:47.8188767Z     )
2025-12-04T10:15:47.8188984Z   )
2025-12-04T10:15:47.8189225Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8189824Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8190444Z   ))
2025-12-04T10:15:47.8190563Z 
2025-12-04T10:15:47.8191277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8192102Z 
2025-12-04T10:15:47.8192106Z 
2025-12-04T10:15:47.8192330Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8193235Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8193954Z 
2025-12-04T10:15:47.8194216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8194843Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8195296Z frames [('total', 1)]
2025-12-04T10:15:47.8195568Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8195992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8196464Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8196782Z graph_break []
2025-12-04T10:15:47.8197064Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8197527Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8197967Z frames [('total', 1)]
2025-12-04T10:15:47.8198253Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8198680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8199156Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8199476Z graph_break []
2025-12-04T10:15:47.8199749Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8200224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8200663Z frames [('total', 1)]
2025-12-04T10:15:47.8201123Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8201556Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8202010Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8202399Z graph_break []
2025-12-04T10:15:47.8202682Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8203717Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml -
2025-12-04T10:15:47.8204780Z =========================== short test summary info ============================
2025-12-04T10:15:47.8206865Z FAILED [0.1043s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8208870Z   target: aten.mm.default
2025-12-04T10:15:47.8209168Z   args[0]: TensorBox(
2025-12-04T10:15:47.8209435Z     ReinterpretView(
2025-12-04T10:15:47.8209711Z       StorageBox(
2025-12-04T10:15:47.8210394Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8211012Z       ),
2025-12-04T10:15:47.8211435Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8211985Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8212308Z       stack_traces = {,
2025-12-04T10:15:47.8212862Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8213493Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8213904Z       ,
2025-12-04T10:15:47.8214109Z       }
2025-12-04T10:15:47.8214324Z     )
2025-12-04T10:15:47.8214539Z   )
2025-12-04T10:15:47.8214766Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8215378Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8215993Z   ))
2025-12-04T10:15:47.8216112Z 
2025-12-04T10:15:47.8216814Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8217653Z 
2025-12-04T10:15:47.8217657Z 
2025-12-04T10:15:47.8217868Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8218786Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8219487Z 
2025-12-04T10:15:47.8219759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8220342Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:15:47.8220867Z ========= 1 failed, 11 passed, 3 skipped, 2 rerun in 180.82s (0:03:00) =========
2025-12-04T10:15:47.8221320Z Got exit code 1
2025-12-04T10:15:47.8221579Z Retrying single test...
2025-12-04T10:15:47.8222189Z W1204 10:14:24.778000 35817 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8223354Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml
2025-12-04T10:15:47.8224246Z ============================= test session starts ==============================
2025-12-04T10:15:47.8224889Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:15:47.8225465Z cachedir: .pytest_cache
2025-12-04T10:15:47.8226154Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:15:47.8226919Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:15:47.8227248Z configfile: pytest.ini
2025-12-04T10:15:47.8227957Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:15:47.8228833Z collecting ... collected 18 items / 17 deselected / 1 selected
2025-12-04T10:15:47.8229843Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8230732Z Running 1 items in this shard
2025-12-04T10:15:47.8230954Z 
2025-12-04T10:15:47.8231867Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation W1204 10:14:29.226000 35817 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8233040Z ('RERUN', {'yellow': True}) [4.5730s] [100%]
2025-12-04T10:15:47.8233862Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1036s] [100%]
2025-12-04T10:15:47.8235022Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1038s] [100%]
2025-12-04T10:15:47.8235635Z 
2025-12-04T10:15:47.8235838Z ==================================== RERUNS ====================================
2025-12-04T10:15:47.8236405Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8236942Z Traceback (most recent call last):
2025-12-04T10:15:47.8237702Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8238472Z     out = f(*inputs)
2025-12-04T10:15:47.8239120Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8240021Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8240905Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8241732Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8242621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8243395Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8244197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8245177Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8246149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8246907Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8247517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8248163Z     return super().run(*args)
2025-12-04T10:15:47.8248750Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8249401Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8250077Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8250754Z     result = super().run_node(n)
2025-12-04T10:15:47.8251378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8252102Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8252846Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8253664Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8254481Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8255277Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8256053Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8256730Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8257406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8258100Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8258910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8259720Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8260430Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8261305Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8263048Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8264537Z   target: aten.mm.default
2025-12-04T10:15:47.8264820Z   args[0]: TensorBox(
2025-12-04T10:15:47.8265099Z     ReinterpretView(
2025-12-04T10:15:47.8265369Z       StorageBox(
2025-12-04T10:15:47.8265912Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8266531Z       ),
2025-12-04T10:15:47.8266957Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8267547Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8267888Z       stack_traces = {,
2025-12-04T10:15:47.8268437Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8269060Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8269378Z       ,
2025-12-04T10:15:47.8269595Z       }
2025-12-04T10:15:47.8269807Z     )
2025-12-04T10:15:47.8270001Z   )
2025-12-04T10:15:47.8270240Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8270851Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8271452Z   ))
2025-12-04T10:15:47.8271583Z 
2025-12-04T10:15:47.8272277Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8273115Z 
2025-12-04T10:15:47.8273120Z 
2025-12-04T10:15:47.8273334Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8274260Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8274965Z 
2025-12-04T10:15:47.8275225Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8275847Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8276309Z frames [('total', 1)]
2025-12-04T10:15:47.8276596Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8276913Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8277365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8277814Z graph_break []
2025-12-04T10:15:47.8278077Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8278613Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8279157Z Traceback (most recent call last):
2025-12-04T10:15:47.8279919Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8280690Z     out = f(*inputs)
2025-12-04T10:15:47.8281335Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8282253Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8283131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8283960Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8284785Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8285566Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8286352Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8287332Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8288295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8289047Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8289752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8290396Z     return super().run(*args)
2025-12-04T10:15:47.8290992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8291628Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8292298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8292974Z     result = super().run_node(n)
2025-12-04T10:15:47.8293663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8294383Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8295127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8295957Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8296768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8297571Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8298345Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8299028Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8299686Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8300392Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8301344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8302148Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8302853Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8303740Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8305385Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8306853Z   target: aten.mm.default
2025-12-04T10:15:47.8307152Z   args[0]: TensorBox(
2025-12-04T10:15:47.8307432Z     ReinterpretView(
2025-12-04T10:15:47.8307707Z       StorageBox(
2025-12-04T10:15:47.8308252Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8308873Z       ),
2025-12-04T10:15:47.8309295Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8309825Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8310167Z       stack_traces = {,
2025-12-04T10:15:47.8310719Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8311330Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8311657Z       ,
2025-12-04T10:15:47.8311871Z       }
2025-12-04T10:15:47.8312070Z     )
2025-12-04T10:15:47.8312285Z   )
2025-12-04T10:15:47.8312525Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8313125Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8313744Z   ))
2025-12-04T10:15:47.8313880Z 
2025-12-04T10:15:47.8314577Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8315399Z 
2025-12-04T10:15:47.8315403Z 
2025-12-04T10:15:47.8315625Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8316655Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8317356Z 
2025-12-04T10:15:47.8317617Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8318240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8318699Z frames [('total', 1)]
2025-12-04T10:15:47.8318979Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8319310Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8319837Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8320288Z graph_break []
2025-12-04T10:15:47.8320549Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8321010Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8321465Z frames [('total', 1)]
2025-12-04T10:15:47.8321737Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8322219Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8322698Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8323014Z graph_break []
2025-12-04T10:15:47.8323290Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8323690Z =================================== FAILURES ===================================
2025-12-04T10:15:47.8324241Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8324782Z Traceback (most recent call last):
2025-12-04T10:15:47.8325569Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8326346Z     out = f(*inputs)
2025-12-04T10:15:47.8326982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8327836Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8328720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8329547Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8330349Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8331125Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8331921Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8332890Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8333857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8334617Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8335228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8335856Z     return super().run(*args)
2025-12-04T10:15:47.8336456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8337109Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8337768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8338443Z     result = super().run_node(n)
2025-12-04T10:15:47.8339088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8339811Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8340537Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8341357Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8342248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8343055Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8343819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8344508Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8345180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8345924Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8346740Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8347557Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8348261Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8349254Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8350897Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8352374Z   target: aten.mm.default
2025-12-04T10:15:47.8352664Z   args[0]: TensorBox(
2025-12-04T10:15:47.8352928Z     ReinterpretView(
2025-12-04T10:15:47.8353202Z       StorageBox(
2025-12-04T10:15:47.8353755Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8354358Z       ),
2025-12-04T10:15:47.8354779Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8355324Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8355652Z       stack_traces = {,
2025-12-04T10:15:47.8356211Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8356837Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8357173Z       ,
2025-12-04T10:15:47.8357380Z       }
2025-12-04T10:15:47.8357594Z     )
2025-12-04T10:15:47.8357812Z   )
2025-12-04T10:15:47.8358037Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8358650Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8359268Z   ))
2025-12-04T10:15:47.8359384Z 
2025-12-04T10:15:47.8360083Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8360914Z 
2025-12-04T10:15:47.8360919Z 
2025-12-04T10:15:47.8361132Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8362055Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8362821Z 
2025-12-04T10:15:47.8363092Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8363712Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8364161Z frames [('total', 1)]
2025-12-04T10:15:47.8364455Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8364788Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8365235Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8365687Z graph_break []
2025-12-04T10:15:47.8365964Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8366421Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8366874Z frames [('total', 1)]
2025-12-04T10:15:47.8367161Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8367660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8368139Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8368473Z graph_break []
2025-12-04T10:15:47.8368752Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8369202Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8369651Z frames [('total', 1)]
2025-12-04T10:15:47.8369934Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8370343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8370867Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8371193Z graph_break []
2025-12-04T10:15:47.8371469Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8372498Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml -
2025-12-04T10:15:47.8373591Z =========================== short test summary info ============================
2025-12-04T10:15:47.8375679Z FAILED [0.1038s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8377699Z   target: aten.mm.default
2025-12-04T10:15:47.8377981Z   args[0]: TensorBox(
2025-12-04T10:15:47.8378253Z     ReinterpretView(
2025-12-04T10:15:47.8378523Z       StorageBox(
2025-12-04T10:15:47.8379055Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8379671Z       ),
2025-12-04T10:15:47.8380086Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8380623Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8380950Z       stack_traces = {,
2025-12-04T10:15:47.8381497Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8382119Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8382434Z       ,
2025-12-04T10:15:47.8382639Z       }
2025-12-04T10:15:47.8382839Z     )
2025-12-04T10:15:47.8383031Z   )
2025-12-04T10:15:47.8383259Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8383865Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8384468Z   ))
2025-12-04T10:15:47.8384591Z 
2025-12-04T10:15:47.8385286Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8386120Z 
2025-12-04T10:15:47.8386124Z 
2025-12-04T10:15:47.8386334Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8387253Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8387955Z 
2025-12-04T10:15:47.8388224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8388789Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:15:47.8389295Z ================== 1 failed, 17 deselected, 2 rerun in 4.81s ===================
2025-12-04T10:15:47.8389724Z Got exit code 1
2025-12-04T10:15:47.8389978Z Retrying single test...
2025-12-04T10:15:47.8390593Z W1204 10:14:43.472000 35986 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8391753Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml
2025-12-04T10:15:47.8392647Z ============================= test session starts ==============================
2025-12-04T10:15:47.8393374Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:15:47.8393957Z cachedir: .pytest_cache
2025-12-04T10:15:47.8394656Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:15:47.8395407Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:15:47.8395749Z configfile: pytest.ini
2025-12-04T10:15:47.8396459Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:15:47.8397385Z collecting ... collected 18 items / 17 deselected / 1 selected
2025-12-04T10:15:47.8398375Z stepcurrent: skipping 14 already run items. Running only test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8399269Z Running 1 items in this shard
2025-12-04T10:15:47.8399476Z 
2025-12-04T10:15:47.8400402Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation W1204 10:14:47.897000 35986 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8401745Z ('RERUN', {'yellow': True}) [4.5480s] [100%]
2025-12-04T10:15:47.8402623Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation ('RERUN', {'yellow': True}) [0.1053s] [100%]
2025-12-04T10:15:47.8403980Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation FAILED [0.1042s] [100%]
2025-12-04T10:15:47.8404590Z 
2025-12-04T10:15:47.8404744Z ==================================== RERUNS ====================================
2025-12-04T10:15:47.8405308Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8405836Z Traceback (most recent call last):
2025-12-04T10:15:47.8406617Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8407388Z     out = f(*inputs)
2025-12-04T10:15:47.8408020Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8408873Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8409755Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8410577Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8411384Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8412160Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8412953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8413928Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8414881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8415636Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8416239Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8416868Z     return super().run(*args)
2025-12-04T10:15:47.8417459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8418115Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8418783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8419443Z     result = super().run_node(n)
2025-12-04T10:15:47.8420075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8420929Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8421656Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8422482Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8423294Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8424093Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8424931Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8425612Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8426279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8426976Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8427773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8428588Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8429291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8430151Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8431795Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8433287Z   target: aten.mm.default
2025-12-04T10:15:47.8433578Z   args[0]: TensorBox(
2025-12-04T10:15:47.8433835Z     ReinterpretView(
2025-12-04T10:15:47.8434099Z       StorageBox(
2025-12-04T10:15:47.8434653Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8435258Z       ),
2025-12-04T10:15:47.8435676Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8436222Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8436565Z       stack_traces = {,
2025-12-04T10:15:47.8437097Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8437713Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8438045Z       ,
2025-12-04T10:15:47.8438244Z       }
2025-12-04T10:15:47.8438450Z     )
2025-12-04T10:15:47.8438658Z   )
2025-12-04T10:15:47.8438879Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8439493Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8440104Z   ))
2025-12-04T10:15:47.8440220Z 
2025-12-04T10:15:47.8440938Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8441760Z 
2025-12-04T10:15:47.8441765Z 
2025-12-04T10:15:47.8441980Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8442972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8443694Z 
2025-12-04T10:15:47.8443956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8444585Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8445030Z frames [('total', 1)]
2025-12-04T10:15:47.8445324Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8445659Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8446104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8446553Z graph_break []
2025-12-04T10:15:47.8446899Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8447422Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8447965Z Traceback (most recent call last):
2025-12-04T10:15:47.8448742Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8449513Z     out = f(*inputs)
2025-12-04T10:15:47.8450154Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8451068Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8451950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8452777Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8453595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8454374Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8455174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8456141Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8457111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8457877Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8458490Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8459122Z     return super().run(*args)
2025-12-04T10:15:47.8459718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8460373Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8461040Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8461715Z     result = super().run_node(n)
2025-12-04T10:15:47.8462353Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8463073Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8463804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8464639Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8465459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8466263Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8467037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8467723Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8468394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8469085Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8469890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8470708Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8471412Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8472276Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8474000Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8475492Z   target: aten.mm.default
2025-12-04T10:15:47.8475795Z   args[0]: TensorBox(
2025-12-04T10:15:47.8476058Z     ReinterpretView(
2025-12-04T10:15:47.8476328Z       StorageBox(
2025-12-04T10:15:47.8476880Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8477490Z       ),
2025-12-04T10:15:47.8477915Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8478546Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8478872Z       stack_traces = {,
2025-12-04T10:15:47.8479428Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8480056Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8480387Z       ,
2025-12-04T10:15:47.8480588Z       }
2025-12-04T10:15:47.8480801Z     )
2025-12-04T10:15:47.8481017Z   )
2025-12-04T10:15:47.8481240Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8481855Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8482538Z   ))
2025-12-04T10:15:47.8482657Z 
2025-12-04T10:15:47.8483357Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8484203Z 
2025-12-04T10:15:47.8484207Z 
2025-12-04T10:15:47.8484416Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8485338Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8486044Z 
2025-12-04T10:15:47.8486319Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8486935Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8487393Z frames [('total', 1)]
2025-12-04T10:15:47.8487695Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8488029Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8488467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8488917Z graph_break []
2025-12-04T10:15:47.8489197Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8489645Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8490100Z frames [('total', 1)]
2025-12-04T10:15:47.8490385Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8490797Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8491270Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8491599Z graph_break []
2025-12-04T10:15:47.8491884Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8492268Z =================================== FAILURES ===================================
2025-12-04T10:15:47.8492838Z ___________ TestKernelBenchmark.test_slice_mm_bandwidth_computation ____________
2025-12-04T10:15:47.8493380Z Traceback (most recent call last):
2025-12-04T10:15:47.8494145Z   File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 403, in test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8494921Z     out = f(*inputs)
2025-12-04T10:15:47.8495572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:15:47.8496438Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:15:47.8497309Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:15:47.8498141Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:15:47.8499050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:15:47.8499825Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:15:47.8500628Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:15:47.8501766Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:15:47.8502741Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:15:47.8503593Z     graph.run(*example_inputs)
2025-12-04T10:15:47.8504205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:15:47.8504851Z     return super().run(*args)
2025-12-04T10:15:47.8505453Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:15:47.8506090Z     self.env[node] = self.run_node(node)
2025-12-04T10:15:47.8506770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:15:47.8507452Z     result = super().run_node(n)
2025-12-04T10:15:47.8508071Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:15:47.8508788Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:15:47.8509527Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:15:47.8510364Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:15:47.8511170Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:15:47.8511969Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:15:47.8512754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:15:47.8513428Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:15:47.8514103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:15:47.8514804Z     return autotune_select_algorithm(
2025-12-04T10:15:47.8515618Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:15:47.8516426Z     return cache(*args, **kwargs)
2025-12-04T10:15:47.8517134Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:15:47.8518011Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:15:47.8519653Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8521139Z   target: aten.mm.default
2025-12-04T10:15:47.8521436Z   args[0]: TensorBox(
2025-12-04T10:15:47.8521713Z     ReinterpretView(
2025-12-04T10:15:47.8521969Z       StorageBox(
2025-12-04T10:15:47.8522588Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8523207Z       ),
2025-12-04T10:15:47.8523628Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8524171Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8524514Z       stack_traces = {,
2025-12-04T10:15:47.8525067Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8525683Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8526018Z       ,
2025-12-04T10:15:47.8526237Z       }
2025-12-04T10:15:47.8526438Z     )
2025-12-04T10:15:47.8526655Z   )
2025-12-04T10:15:47.8526999Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8527605Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8528224Z   ))
2025-12-04T10:15:47.8528343Z 
2025-12-04T10:15:47.8529052Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8529877Z 
2025-12-04T10:15:47.8529940Z 
2025-12-04T10:15:47.8530165Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8531070Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8531785Z 
2025-12-04T10:15:47.8532046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8532670Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8533135Z frames [('total', 1)]
2025-12-04T10:15:47.8533411Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8533745Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8534194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8534636Z graph_break []
2025-12-04T10:15:47.8534912Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8535378Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8535832Z frames [('total', 1)]
2025-12-04T10:15:47.8536105Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8536526Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8536992Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8537311Z graph_break []
2025-12-04T10:15:47.8537587Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8538044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:15:47.8538487Z frames [('total', 1)]
2025-12-04T10:15:47.8538772Z stats [('calls_captured', 2)]
2025-12-04T10:15:47.8539194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:15:47.8539650Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T10:15:47.8539981Z graph_break []
2025-12-04T10:15:47.8540255Z aten_mm_info [('aten.mm_s97_2000_3000', 1)]
2025-12-04T10:15:47.8541283Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml -
2025-12-04T10:15:47.8542359Z =========================== short test summary info ============================
2025-12-04T10:15:47.8544450Z FAILED [0.1042s] inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:15:47.8546454Z   target: aten.mm.default
2025-12-04T10:15:47.8546756Z   args[0]: TensorBox(
2025-12-04T10:15:47.8547021Z     ReinterpretView(
2025-12-04T10:15:47.8547288Z       StorageBox(
2025-12-04T10:15:47.8547833Z         InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float16, size=[s97, 9000], stride=[9000, 1]))
2025-12-04T10:15:47.8548439Z       ),
2025-12-04T10:15:47.8548862Z       FixedLayout('cuda:0', torch.float16, size=[s97, 3000], stride=[9000, 1], offset=3000),
2025-12-04T10:15:47.8549414Z       origins=OrderedSet([slice_1]),
2025-12-04T10:15:47.8549752Z       stack_traces = {,
2025-12-04T10:15:47.8550292Z         File "/var/lib/jenkins/workspace/test/inductor/test_kernel_benchmark.py", line 396, in f,
2025-12-04T10:15:47.8550914Z           x = torch.narrow(a, 1, K, K),
2025-12-04T10:15:47.8551244Z       ,
2025-12-04T10:15:47.8551447Z       }
2025-12-04T10:15:47.8551661Z     )
2025-12-04T10:15:47.8551942Z   )
2025-12-04T10:15:47.8552173Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:15:47.8552793Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float16, size=[3000, 2000], stride=[2000, 1]))
2025-12-04T10:15:47.8553409Z   ))
2025-12-04T10:15:47.8553527Z 
2025-12-04T10:15:47.8554227Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:15:47.8555130Z 
2025-12-04T10:15:47.8555135Z 
2025-12-04T10:15:47.8555343Z To execute this test, run the following from the base repo dir:
2025-12-04T10:15:47.8556269Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_kernel_benchmark.py TestKernelBenchmark.test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8556968Z 
2025-12-04T10:15:47.8557245Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:15:47.8557834Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:15:47.8558344Z ================== 1 failed, 17 deselected, 2 rerun in 4.79s ===================
2025-12-04T10:15:47.8558786Z Got exit code 1
2025-12-04T10:15:47.8559455Z FAILED CONSISTENTLY: test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation
2025-12-04T10:15:47.8560480Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:15:47.8561458Z W1204 10:15:02.382000 36155 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:15:47.8562698Z Test results will be stored in test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml
2025-12-04T10:15:47.8563599Z ============================= test session starts ==============================
2025-12-04T10:15:47.8564242Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:15:47.8564837Z cachedir: .pytest_cache
2025-12-04T10:15:47.8565534Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:15:47.8566301Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:15:47.8566630Z configfile: pytest.ini
2025-12-04T10:15:47.8567337Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:15:47.8568213Z collecting ... collected 18 items / 15 deselected / 3 selected
2025-12-04T10:15:47.8568684Z stepcurrent: skipping 15 already run items.
2025-12-04T10:15:47.8569061Z Running 3 items in this shard
2025-12-04T10:15:47.8569264Z 
2025-12-04T10:15:47.8569663Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_split_scan PASSED [17.2639s] [ 33%]
2025-12-04T10:15:47.8570557Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_star_dep PASSED [12.8722s] [ 66%]
2025-12-04T10:15:47.8571548Z inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_unused_input_bandwidth_computation PASSED [13.2386s] [100%]
2025-12-04T10:15:47.8572188Z 
2025-12-04T10:15:47.8572964Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml -
2025-12-04T10:15:47.8574059Z ====================== 3 passed, 15 deselected in 43.40s =======================
2025-12-04T10:15:47.8574986Z The following tests failed consistently: ['test/inductor/test_kernel_benchmark.py::TestKernelBenchmark::test_slice_mm_bandwidth_computation']
2025-12-04T10:15:47.8575723Z 
2025-12-04T10:15:47.8576306Z FINISHED PRINTING LOG FILE of inductor/test_kernel_benchmark 1/1 (test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log)
2025-12-04T10:15:47.8577038Z 
2025-12-04T10:15:47.8577410Z Finished inductor/test_kernel_benchmark 1/1 ... [2025-12-04 10:15:47.800709][3705.410615958], took 4.73min
2025-12-04T10:15:47.8578811Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml
2025-12-04T10:15:47.8825481Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml
2025-12-04T10:15:47.9127149Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml
2025-12-04T10:15:47.9467645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml
2025-12-04T10:15:48.3112572Z Uploading logs for 57119749427 to S3
2025-12-04T10:15:48.3552672Z Uploading artifacts took 0.37 seconds
2025-12-04T10:15:48.3553110Z inductor/test_kernel_benchmark 1/1 failed!
2025-12-04T10:15:48.3557703Z Running inductor/test_torchinductor_opinfo 3/17 ... [2025-12-04 10:15:48.355592][3705.965500361]
2025-12-04T10:15:48.3558392Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:15:48.3562916Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=3', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:15:48.356027]
2025-12-04T10:23:55.6365138Z 
2025-12-04T10:23:55.6366672Z inductor/test_torchinductor_opinfo 3/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_3.17_09d50cf3d15b8ee9_.log
2025-12-04T10:23:55.6499993Z Running 231 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___getitem___cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___radd___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rdiv___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmod___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rmul___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rpow___cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive___rsub___cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__batch_norm_with_update_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__chunk_cat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_abs_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addbmm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_alias_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_partial_views_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asinh_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bincount_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_and_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_block_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_tensors_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_broadcast_to_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cat_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chunk_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_min_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_combinations_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_copysign_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_corrcoef_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cross_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_deg2rad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagflat_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_permuted_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_erfinv_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expm1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfftn_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_gather_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_half_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_heaviside_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hypot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_mean_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isin_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isreal_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_kron_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_le_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_inv_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vecdot_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_xor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amin_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_cumprod_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_fill_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_log_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_std_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_matmul_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_with_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_binary_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_min_reduction_with_dim_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmedian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_batch_norm_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ne_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_full_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv3d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_embedding_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_silu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softplus_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softshrink_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softsign_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_fro_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_outer_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_conj_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sigmoid_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_hann_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_zeta_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_square_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_along_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tensor_split_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapz_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_true_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vdot_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_complex_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vsplit_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_vstack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zeros_cuda_float16
2025-12-04T10:23:55.6630499Z 
2025-12-04T10:23:55.6630917Z Finished inductor/test_torchinductor_opinfo 3/17 ... [2025-12-04 10:23:55.636809][4193.246718126], took 8.12min
2025-12-04T10:23:55.6632346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.xml
2025-12-04T10:23:55.7251137Z Running inductor/test_torchinductor_opinfo 8/17 ... [2025-12-04 10:23:55.724806][4193.33471401]
2025-12-04T10:23:55.7251750Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:23:55.7254786Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=8', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:23:55.725192]
2025-12-04T10:34:25.0905132Z 
2025-12-04T10:34:25.0906231Z inductor/test_torchinductor_opinfo 8/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_8.17_f4805f992a426064_.log
2025-12-04T10:34:25.1016390Z Running 190 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acos_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmv_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_all_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_allclose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_aminmax_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_angle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_left_shift_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bitwise_not_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ceil_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_char_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_physical_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cov_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cummax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_floor_rounding_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dsplit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dstack_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifftshift_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ihfft_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fill_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fliplr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_geometric_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_i0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_igamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_reduce_amax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isclose_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_isnan_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ldexp_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cond_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_cross_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_diagonal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_eig_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_svd_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logaddexp2_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logcumsumexp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logit_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lu_unpack_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmin_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_scatter_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_sum_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nan_to_num_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanmean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nanquantile_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_narrow_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_native_layer_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_empty_strided_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_new_ones_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nextafter_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_celu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_conv_transpose1d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_elu_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardshrink_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_selu_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_norm_inf_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_positive_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_prod_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_qr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reciprocal_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_remainder_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_reshape_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_0_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_add_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_amax_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_prod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_searchsorted_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_short_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sign_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_bartlett_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinc_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sinh_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_slice_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j1_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y1_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_list_args_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sqrt_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_multiple_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_stack_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_std_mean_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sub_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_take_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tan_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trapezoid_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_trunc_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_copy_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_where_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_xlogy_cuda_float64
2025-12-04T10:34:25.1124157Z 
2025-12-04T10:34:25.1124576Z Finished inductor/test_torchinductor_opinfo 8/17 ... [2025-12-04 10:34:25.090526][4822.70043385], took 10.49min
2025-12-04T10:34:25.1126013Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.xml
2025-12-04T10:34:25.1780104Z Running inductor/test_torchinductor_opinfo 13/17 ... [2025-12-04 10:34:25.177707][4822.787615175]
2025-12-04T10:34:25.1780725Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:34:25.1784180Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_torchinductor_opinfo.py', '--shard-id=13', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:34:25.178121]
2025-12-04T10:45:00.1141950Z 
2025-12-04T10:45:00.1146092Z inductor/test_torchinductor_opinfo 13/17 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_torchinductor_opinfo_13.17_50bb27b4d6383988_.log
2025-12-04T10:45:00.1269336Z Running 210 items in this shard: test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_H_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_T_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__softmax_backward_data_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive__unsafe_masked_index_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_acosh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_add_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addcdiv_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addmm_decomposed_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_addr_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_amin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmax_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argmin_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_argsort_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_as_strided_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_asin_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atan_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atanh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_atleast_3d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bfloat16_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_bmm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cdouble_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cfloat_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_chalf_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clamp_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_clone_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_conj_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_constant_pad_nd_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_contiguous_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cos_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cosh_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumprod_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_cumsum_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diag_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_diff_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_digamma_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_dist_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_div_trunc_rounding_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_double_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_empty_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exp2_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_expand_as_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_exponential_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_eye_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_hfft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfft2_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_irfftn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_fft_rfftn_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flip_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_flipud_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frac_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_frexp_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_full_like_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ge_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hash_tensor_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_hstack_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_fill_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_index_select_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_int_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lerp_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_matrix_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_multi_dot_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_solve_triangular_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_vander_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linspace_tensor_overload_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log10_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log1p_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_log_softmax_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_logical_or_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_long_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_lt_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mT_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_amax_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_argmax_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_prod_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_select_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_masked_softmin_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_binary_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_pool2d_with_indices_backward_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_max_reduction_no_dim_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_minimum_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mode_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_msort_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nansum_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_fractional_max_pool2d_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_hardswish_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_linear_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_local_response_norm_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_multi_margin_loss_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_normalize_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_threshold_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_nn_functional_unfold_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_normal_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ones_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_permute_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pow_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_put_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rad2deg_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randint_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_randn_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_ravel_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_real_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resize_as__cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_resolve_neg_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_rsqrt_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scalar_tensor_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_scatter_reduce_mean_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_select_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_exponential_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_signbit_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_softmax_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sort_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_airy_ai_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_j0_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_bessel_y0_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_entr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i0e_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_i1_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_log_ndtr_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_ndtri_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_special_xlog1py_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_squeeze_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_sum_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_t_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_tile_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_to_sparse_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_topk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_transpose_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_triangular_solve_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unbind_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unflatten_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_copy_cuda_int64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unfold_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsafe_chunk_cuda_float64, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_unsqueeze_copy_cuda_bool, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_var_mean_unbiased_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_as_cuda_float32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_float16, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_view_copy_cuda_int32, test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_zero__cuda_float32
2025-12-04T10:45:00.1389807Z 
2025-12-04T10:45:00.1390222Z Finished inductor/test_torchinductor_opinfo 13/17 ... [2025-12-04 10:45:00.114285][5457.72419387], took 10.58min
2025-12-04T10:45:00.1391683Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.xml
2025-12-04T10:45:00.5384128Z Uploading artifacts took 0.34 seconds
2025-12-04T10:45:00.5388283Z Running inductor/test_pattern_matcher 1/1 ... [2025-12-04 10:45:00.538632][5458.148539749]
2025-12-04T10:45:00.5389046Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:45:00.5393309Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pattern_matcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:45:00.539059]
2025-12-04T10:46:59.4237778Z 
2025-12-04T10:46:59.4239080Z PRINTING LOG FILE of inductor/test_pattern_matcher 1/1 (test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log)
2025-12-04T10:46:59.4240967Z W1204 10:45:09.196000 77296 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4242216Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml
2025-12-04T10:46:59.4243116Z ============================= test session starts ==============================
2025-12-04T10:46:59.4243794Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:46:59.4244387Z cachedir: .pytest_cache
2025-12-04T10:46:59.4245071Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:46:59.4245845Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:46:59.4247447Z configfile: pytest.ini
2025-12-04T10:46:59.4248195Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:46:59.4248971Z collecting ... collected 52 items
2025-12-04T10:46:59.4249377Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:46:59.4272178Z Running 52 items in this shard: test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_alpha_beta_with_pointwise, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_broadcasting_bias, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_dtype_mismatch, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_symbolic_scalar, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_bmm_to_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_bound_method_pattern_matcher, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_cuda, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_xpu, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_splitwithsizes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_duplicate_search, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_epilogue, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_gating, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_fwd_only_generate_original_aten_meta, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_input_output_same, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations1, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations2, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations3, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_with_mutation, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_bad_cases, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_cpu, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_epi_works, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_exhaustive_dtypes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_gating, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mm_plus_mm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_multioutput_register_replacement, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_mutation_op_matching, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_convert, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_cumsum, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair_3d, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair_dynamic_shapes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_noop_pass_with_remove_passes, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_pointless_clones, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_replace_mul_zero, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_scaled_softmax, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_serialized_patterns_up_to_date, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_splitwithsizes_cat, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_stable_topological_sort, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case0, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case1, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case2, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_symint_pattern_matching, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unfuse_bias_addmm, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case0, test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case1
2025-12-04T10:46:59.4295436Z 
2025-12-04T10:46:59.4296229Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm W1204 10:45:14.863000 77296 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4297243Z PASSED [6.6532s] [  1%]
2025-12-04T10:46:59.4297894Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_alpha_beta_with_pointwise PASSED [0.6349s] [  3%]
2025-12-04T10:46:59.4298916Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_broadcasting_bias PASSED [0.1976s] [  5%]
2025-12-04T10:46:59.4299894Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_dtype_mismatch PASSED [0.5110s] [  7%]
2025-12-04T10:46:59.4301049Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_addmm_symbolic_scalar PASSED [0.7121s] [  9%]
2025-12-04T10:46:59.4301948Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_bmm_to_mm PASSED [0.2353s] [ 11%]
2025-12-04T10:46:59.4302891Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_bound_method_pattern_matcher PASSED [1.2701s] [ 13%]
2025-12-04T10:46:59.4303840Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_addmm PASSED [0.1424s] [ 15%]
2025-12-04T10:46:59.4304672Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_mm PASSED [0.1369s] [ 17%]
2025-12-04T10:46:59.4305542Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_cuda PASSED [1.3428s] [ 19%]
2025-12-04T10:46:59.4306479Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_slice_cat_xpu PASSED [1.2430s] [ 21%]
2025-12-04T10:46:59.4307417Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_cat_splitwithsizes PASSED [2.3046s] [ 23%]
2025-12-04T10:46:59.4308363Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_duplicate_search PASSED [0.1865s] [ 25%]
2025-12-04T10:46:59.4309402Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul SKIPPED [0.0003s] (templates require big gpu) [ 26%]
2025-12-04T10:46:59.4310622Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_epilogue SKIPPED [0.0002s] (templates require big gpu) [ 28%]
2025-12-04T10:46:59.4311875Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fused_int_mm_mul_gating SKIPPED [0.0002s] (templates require big gpu) [ 30%]
2025-12-04T10:46:59.4313041Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_fwd_only_generate_original_aten_meta PASSED [0.0088s] [ 32%]
2025-12-04T10:46:59.4314045Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_input_output_same PASSED [0.7180s] [ 34%]
2025-12-04T10:46:59.4315073Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations1 PASSED [0.6273s] [ 36%]
2025-12-04T10:46:59.4316207Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations2 PASSED [0.5773s] [ 38%]
2025-12-04T10:46:59.4317336Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_equivalent_function_invocations3 PASSED [0.5795s] [ 40%]
2025-12-04T10:46:59.4318519Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_match_with_mutation PASSED [1.2711s] [ 42%]
2025-12-04T10:46:59.4319554Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm SKIPPED [0.0003s] (templates require big gpu) [ 44%]
2025-12-04T10:46:59.4320704Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_bad_cases SKIPPED [0.0002s] (templates require big gpu) [ 46%]
2025-12-04T10:46:59.4321805Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_cpu PASSED [0.7236s] [ 48%]
2025-12-04T10:46:59.4322835Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_epi_works SKIPPED [0.0003s] (templates require big gpu) [ 50%]
2025-12-04T10:46:59.4324191Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_exhaustive_dtypes SKIPPED [0.0002s] (templates require big gpu) [ 51%]
2025-12-04T10:46:59.4325423Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mixed_mm_gating SKIPPED [0.0002s] (templates require big gpu) [ 53%]
2025-12-04T10:46:59.4326450Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mm_plus_mm PASSED [0.7748s] [ 55%]
2025-12-04T10:46:59.4327409Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_multioutput_register_replacement PASSED [0.8276s] [ 57%]
2025-12-04T10:46:59.4328438Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_mutation_op_matching PASSED [0.0048s] [ 59%]
2025-12-04T10:46:59.4329569Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.1245s] [ 61%]
2025-12-04T10:46:59.4330848Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0886s] [ 61%]
2025-12-04T10:46:59.4332021Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0865s] [ 61%]
2025-12-04T10:46:59.4332644Z 
2025-12-04T10:46:59.4332783Z ==================================== RERUNS ====================================
2025-12-04T10:46:59.4333358Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4333905Z Traceback (most recent call last):
2025-12-04T10:46:59.4334690Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4335523Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4336269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4336986Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4337668Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4338523Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4339405Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4340226Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4341050Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4341832Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4342630Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4343590Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4344562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4345322Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4345929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4346558Z     return super().run(*args)
2025-12-04T10:46:59.4347232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4347891Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4348549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4349236Z     result = super().run_node(n)
2025-12-04T10:46:59.4349882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4350609Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4351427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4352262Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4353095Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4353884Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4354672Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4355364Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4356042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4356731Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4357546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4358373Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4359079Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4359948Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4361594Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4363168Z   target: aten.mm.default
2025-12-04T10:46:59.4363485Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4364073Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4364677Z   ))
2025-12-04T10:46:59.4364919Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4365505Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4366103Z   ))
2025-12-04T10:46:59.4366246Z 
2025-12-04T10:46:59.4366958Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4367784Z 
2025-12-04T10:46:59.4367788Z 
2025-12-04T10:46:59.4368005Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4368961Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4369702Z 
2025-12-04T10:46:59.4369965Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4370593Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4371043Z frames [('total', 1)]
2025-12-04T10:46:59.4371344Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4371779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4372468Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4373030Z graph_break []
2025-12-04T10:46:59.4373308Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4373921Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4374457Z Traceback (most recent call last):
2025-12-04T10:46:59.4375260Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4376104Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4376858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4377562Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4378325Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4379181Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4380058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4380885Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4381711Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4382500Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4383283Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4384257Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4385223Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4385984Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4386581Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4387223Z     return super().run(*args)
2025-12-04T10:46:59.4387834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4388475Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4389155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4389830Z     result = super().run_node(n)
2025-12-04T10:46:59.4390471Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4391181Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4391933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4392763Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4393573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4394373Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4395148Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4395832Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4396495Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4397197Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4398013Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4398842Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4399535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4400418Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4402426Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4403935Z   target: aten.mm.default
2025-12-04T10:46:59.4404241Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4404843Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4405450Z   ))
2025-12-04T10:46:59.4405678Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4406362Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4406963Z   ))
2025-12-04T10:46:59.4407083Z 
2025-12-04T10:46:59.4407783Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4408628Z 
2025-12-04T10:46:59.4408633Z 
2025-12-04T10:46:59.4408856Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4409796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4410520Z 
2025-12-04T10:46:59.4410792Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4411415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4411862Z frames [('total', 1)]
2025-12-04T10:46:59.4412160Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4412591Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4413257Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4413831Z graph_break []
2025-12-04T10:46:59.4414102Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4414544Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4414999Z frames [('total', 1)]
2025-12-04T10:46:59.4415287Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4415713Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4416378Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4416942Z graph_break []
2025-12-04T10:46:59.4417209Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4417578Z =================================== FAILURES ===================================
2025-12-04T10:46:59.4418158Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4418703Z Traceback (most recent call last):
2025-12-04T10:46:59.4419496Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4420315Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4421060Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4421773Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4422456Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4423312Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4424197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4425030Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4425837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4426621Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4427486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4428468Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4429427Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4430188Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4430797Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4431427Z     return super().run(*args)
2025-12-04T10:46:59.4432146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4432807Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4433489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4434158Z     result = super().run_node(n)
2025-12-04T10:46:59.4434805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4435532Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4436265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4437105Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4437929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4438743Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4439511Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4440205Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4440883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4441591Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4442468Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4443298Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4444010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4444877Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4446523Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4448011Z   target: aten.mm.default
2025-12-04T10:46:59.4448322Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4448913Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4449513Z   ))
2025-12-04T10:46:59.4449753Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4450333Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4450927Z   ))
2025-12-04T10:46:59.4451055Z 
2025-12-04T10:46:59.4451755Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4452585Z 
2025-12-04T10:46:59.4452589Z 
2025-12-04T10:46:59.4452810Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4453751Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4454477Z 
2025-12-04T10:46:59.4454738Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4455457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4455914Z frames [('total', 1)]
2025-12-04T10:46:59.4456196Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4456629Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4457312Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4457882Z graph_break []
2025-12-04T10:46:59.4458143Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4458663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4459124Z frames [('total', 1)]
2025-12-04T10:46:59.4459402Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4459830Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4460506Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4461068Z graph_break []
2025-12-04T10:46:59.4461337Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4461786Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4462241Z frames [('total', 1)]
2025-12-04T10:46:59.4462515Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4462941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4463620Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4464176Z graph_break []
2025-12-04T10:46:59.4464446Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4465453Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml -
2025-12-04T10:46:59.4466526Z =========================== short test summary info ============================
2025-12-04T10:46:59.4468622Z FAILED [0.0865s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4470645Z   target: aten.mm.default
2025-12-04T10:46:59.4470959Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4471558Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4472147Z   ))
2025-12-04T10:46:59.4472388Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4472977Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4473561Z   ))
2025-12-04T10:46:59.4473690Z 
2025-12-04T10:46:59.4474389Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4475225Z 
2025-12-04T10:46:59.4475229Z 
2025-12-04T10:46:59.4475441Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4476383Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4477107Z 
2025-12-04T10:46:59.4477367Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4477953Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:46:59.4478475Z ============== 1 failed, 23 passed, 8 skipped, 2 rerun in 22.06s ===============
2025-12-04T10:46:59.4478922Z Got exit code 1
2025-12-04T10:46:59.4479174Z Retrying single test...
2025-12-04T10:46:59.4479796Z W1204 10:45:42.090000 78135 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4481034Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml
2025-12-04T10:46:59.4482009Z ============================= test session starts ==============================
2025-12-04T10:46:59.4482647Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:46:59.4483236Z cachedir: .pytest_cache
2025-12-04T10:46:59.4483934Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:46:59.4484765Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:46:59.4485110Z configfile: pytest.ini
2025-12-04T10:46:59.4485818Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:46:59.4486687Z collecting ... collected 52 items / 51 deselected / 1 selected
2025-12-04T10:46:59.4487698Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4488611Z Running 1 items in this shard
2025-12-04T10:46:59.4488819Z 
2025-12-04T10:46:59.4489753Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm W1204 10:45:46.380000 78135 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4490946Z ('RERUN', {'yellow': True}) [4.2298s] [100%]
2025-12-04T10:46:59.4491773Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0895s] [100%]
2025-12-04T10:46:59.4492974Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0887s] [100%]
2025-12-04T10:46:59.4493588Z 
2025-12-04T10:46:59.4493746Z ==================================== RERUNS ====================================
2025-12-04T10:46:59.4494322Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4494855Z Traceback (most recent call last):
2025-12-04T10:46:59.4495647Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4496479Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4497209Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4497928Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4498619Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4499475Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4500346Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4501353Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4502183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4502970Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4503761Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4504738Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4505716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4506466Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4507080Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4507728Z     return super().run(*args)
2025-12-04T10:46:59.4508446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4509093Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4509773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4510455Z     result = super().run_node(n)
2025-12-04T10:46:59.4511086Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4511895Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4512637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4513472Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4514282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4515092Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4515871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4516561Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4517222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4517923Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4518735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4519547Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4520245Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4521118Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4522842Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4524319Z   target: aten.mm.default
2025-12-04T10:46:59.4524636Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4525240Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4525849Z   ))
2025-12-04T10:46:59.4526075Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4526676Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4527277Z   ))
2025-12-04T10:46:59.4527395Z 
2025-12-04T10:46:59.4528096Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4528936Z 
2025-12-04T10:46:59.4528941Z 
2025-12-04T10:46:59.4529155Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4530096Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4530817Z 
2025-12-04T10:46:59.4531088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4531697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4532158Z frames [('total', 1)]
2025-12-04T10:46:59.4532447Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4532985Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4533657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4534106Z graph_break []
2025-12-04T10:46:59.4534378Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4534968Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4535519Z Traceback (most recent call last):
2025-12-04T10:46:59.4536323Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4537162Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4537894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4538669Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4539363Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4540203Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4541084Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4541914Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4542733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4543502Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4544300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4545273Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4546243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4546993Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4547600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4548243Z     return super().run(*args)
2025-12-04T10:46:59.4548834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4549488Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4550163Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4550841Z     result = super().run_node(n)
2025-12-04T10:46:59.4551470Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4552195Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4552937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4553756Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4554577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4555380Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4556156Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4556830Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4557505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4558208Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4559017Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4559833Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4560545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4561428Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4563229Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4564716Z   target: aten.mm.default
2025-12-04T10:46:59.4565031Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4565626Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4566217Z   ))
2025-12-04T10:46:59.4566511Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4567100Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4567684Z   ))
2025-12-04T10:46:59.4567812Z 
2025-12-04T10:46:59.4568510Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4569347Z 
2025-12-04T10:46:59.4569356Z 
2025-12-04T10:46:59.4569567Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4570506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4571227Z 
2025-12-04T10:46:59.4571502Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4572112Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4572575Z frames [('total', 1)]
2025-12-04T10:46:59.4572871Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4573405Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4574099Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4574549Z graph_break []
2025-12-04T10:46:59.4574825Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4587444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4587915Z frames [('total', 1)]
2025-12-04T10:46:59.4588197Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4588641Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4589331Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4589889Z graph_break []
2025-12-04T10:46:59.4590140Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4590540Z =================================== FAILURES ===================================
2025-12-04T10:46:59.4591121Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4591654Z Traceback (most recent call last):
2025-12-04T10:46:59.4592457Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4593297Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4594045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4594745Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4595445Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4596299Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4597175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4598012Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4598833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4599618Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4600533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4601803Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4602779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4603550Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4604150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4604917Z     return super().run(*args)
2025-12-04T10:46:59.4605518Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4606156Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4606835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4607513Z     result = super().run_node(n)
2025-12-04T10:46:59.4608159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4608875Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4609623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4610454Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4611265Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4612078Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4612857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4613546Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4614211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4614915Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4615730Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4616541Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4617253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4618132Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4619777Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4621263Z   target: aten.mm.default
2025-12-04T10:46:59.4621562Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4622169Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4622767Z   ))
2025-12-04T10:46:59.4622993Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4623585Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4624186Z   ))
2025-12-04T10:46:59.4624307Z 
2025-12-04T10:46:59.4625011Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4625851Z 
2025-12-04T10:46:59.4625855Z 
2025-12-04T10:46:59.4626068Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4627014Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4627741Z 
2025-12-04T10:46:59.4628128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4628756Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4629207Z frames [('total', 1)]
2025-12-04T10:46:59.4629503Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4630049Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4630726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4631234Z graph_break []
2025-12-04T10:46:59.4631506Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4631946Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4632400Z frames [('total', 1)]
2025-12-04T10:46:59.4632686Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4633110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4633789Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4634357Z graph_break []
2025-12-04T10:46:59.4634627Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4635064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4635517Z frames [('total', 1)]
2025-12-04T10:46:59.4635802Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4636213Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4636882Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4637448Z graph_break []
2025-12-04T10:46:59.4637718Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4638720Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml -
2025-12-04T10:46:59.4639792Z =========================== short test summary info ============================
2025-12-04T10:46:59.4641975Z FAILED [0.0887s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4644002Z   target: aten.mm.default
2025-12-04T10:46:59.4644303Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4644909Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4645510Z   ))
2025-12-04T10:46:59.4645735Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4646331Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4646930Z   ))
2025-12-04T10:46:59.4647050Z 
2025-12-04T10:46:59.4647760Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4648586Z 
2025-12-04T10:46:59.4648591Z 
2025-12-04T10:46:59.4648802Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4649743Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4650483Z 
2025-12-04T10:46:59.4650753Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4651335Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:46:59.4651839Z ================== 1 failed, 51 deselected, 2 rerun in 4.44s ===================
2025-12-04T10:46:59.4652278Z Got exit code 1
2025-12-04T10:46:59.4652542Z Retrying single test...
2025-12-04T10:46:59.4653345Z W1204 10:46:00.368000 78304 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4654493Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml
2025-12-04T10:46:59.4655388Z ============================= test session starts ==============================
2025-12-04T10:46:59.4656042Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:46:59.4656623Z cachedir: .pytest_cache
2025-12-04T10:46:59.4657381Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:46:59.4658148Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:46:59.4658495Z configfile: pytest.ini
2025-12-04T10:46:59.4659192Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:46:59.4660072Z collecting ... collected 52 items / 51 deselected / 1 selected
2025-12-04T10:46:59.4661099Z stepcurrent: skipping 31 already run items. Running only test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4662015Z Running 1 items in this shard
2025-12-04T10:46:59.4662221Z 
2025-12-04T10:46:59.4663139Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm W1204 10:46:04.679000 78304 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4664317Z ('RERUN', {'yellow': True}) [4.2499s] [100%]
2025-12-04T10:46:59.4665150Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm ('RERUN', {'yellow': True}) [0.0902s] [100%]
2025-12-04T10:46:59.4666342Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm FAILED [0.0875s] [100%]
2025-12-04T10:46:59.4666956Z 
2025-12-04T10:46:59.4667100Z ==================================== RERUNS ====================================
2025-12-04T10:46:59.4667669Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4668211Z Traceback (most recent call last):
2025-12-04T10:46:59.4668993Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4669821Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4670560Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4671277Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4671954Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4672809Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4673695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4674523Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4675330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4676107Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4676902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4677881Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4678835Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4679596Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4680205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4680901Z     return super().run(*args)
2025-12-04T10:46:59.4681502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4682230Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4682914Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4683583Z     result = super().run_node(n)
2025-12-04T10:46:59.4684228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4685020Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4685749Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4686578Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4687408Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4688206Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4688969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4689659Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4690331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4691019Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4691834Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4692653Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4693360Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4694229Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4695869Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4697357Z   target: aten.mm.default
2025-12-04T10:46:59.4697667Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4698248Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4698857Z   ))
2025-12-04T10:46:59.4699099Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4699678Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4700277Z   ))
2025-12-04T10:46:59.4700407Z 
2025-12-04T10:46:59.4701284Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4702119Z 
2025-12-04T10:46:59.4702123Z 
2025-12-04T10:46:59.4702347Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4703292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4704015Z 
2025-12-04T10:46:59.4704277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4704907Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4705364Z frames [('total', 1)]
2025-12-04T10:46:59.4705647Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4706192Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4706878Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4707327Z graph_break []
2025-12-04T10:46:59.4707716Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4708244Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4708789Z Traceback (most recent call last):
2025-12-04T10:46:59.4709566Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4710399Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4711139Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4711958Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4712637Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4713492Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4714382Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4715198Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4716014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4716796Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4717593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4718568Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4719523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4720287Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4720897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4721548Z     return super().run(*args)
2025-12-04T10:46:59.4722201Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4722857Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4723535Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4724197Z     result = super().run_node(n)
2025-12-04T10:46:59.4724840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4725572Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4726316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4727128Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4727952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4728750Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4729529Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4730203Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4730875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4731577Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4732378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4733199Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4733903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4734783Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4736527Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4738015Z   target: aten.mm.default
2025-12-04T10:46:59.4738335Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4738942Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4740097Z   ))
2025-12-04T10:46:59.4740340Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4740935Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4741520Z   ))
2025-12-04T10:46:59.4741653Z 
2025-12-04T10:46:59.4742354Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4743190Z 
2025-12-04T10:46:59.4743195Z 
2025-12-04T10:46:59.4743403Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4744343Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4745064Z 
2025-12-04T10:46:59.4745337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4745951Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4746408Z frames [('total', 1)]
2025-12-04T10:46:59.4746699Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4747229Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4747913Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4748362Z graph_break []
2025-12-04T10:46:59.4748626Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4749078Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4749528Z frames [('total', 1)]
2025-12-04T10:46:59.4749816Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4750227Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4750906Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4751470Z graph_break []
2025-12-04T10:46:59.4751736Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4752116Z =================================== FAILURES ===================================
2025-12-04T10:46:59.4752688Z _________ TestPatternMatcher.test_original_aten_preserved_split_addmm __________
2025-12-04T10:46:59.4753220Z Traceback (most recent call last):
2025-12-04T10:46:59.4754020Z   File "/var/lib/jenkins/workspace/test/inductor/test_pattern_matcher.py", line 1322, in test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4754860Z     ret, code = run_and_get_code(opt_fn, *args)
2025-12-04T10:46:59.4755603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:46:59.4756299Z     result = fn(*args, **kwargs)
2025-12-04T10:46:59.4756985Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 940, in compile_wrapper
2025-12-04T10:46:59.4757838Z     raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
2025-12-04T10:46:59.4758927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1019, in _compile_fx_inner
2025-12-04T10:46:59.4759752Z     raise InductorError(e, currentframe()).with_traceback(
2025-12-04T10:46:59.4760573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1003, in _compile_fx_inner
2025-12-04T10:46:59.4761353Z     mb_compiled_graph = fx_codegen_and_compile(
2025-12-04T10:46:59.4762307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1757, in fx_codegen_and_compile
2025-12-04T10:46:59.4763286Z     return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
2025-12-04T10:46:59.4764252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1452, in codegen_and_compile
2025-12-04T10:46:59.4765014Z     graph.run(*example_inputs)
2025-12-04T10:46:59.4765609Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 987, in run
2025-12-04T10:46:59.4766324Z     return super().run(*args)
2025-12-04T10:46:59.4766924Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 200, in run
2025-12-04T10:46:59.4767582Z     self.env[node] = self.run_node(node)
2025-12-04T10:46:59.4768250Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1726, in run_node
2025-12-04T10:46:59.4768937Z     result = super().run_node(n)
2025-12-04T10:46:59.4769582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/fx/interpreter.py", line 295, in run_node
2025-12-04T10:46:59.4770295Z     return getattr(self, n.op)(n.target, args, kwargs)
2025-12-04T10:46:59.4771035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1369, in call_function
2025-12-04T10:46:59.4771862Z     raise LoweringException(e, target, args, kwargs).with_traceback(
2025-12-04T10:46:59.4772688Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 1359, in call_function
2025-12-04T10:46:59.4773471Z     out = lowerings[target](*args, **kwargs)  # type: ignore[index]
2025-12-04T10:46:59.4774247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/lowering.py", line 495, in wrapped
2025-12-04T10:46:59.4774935Z     out = decomp_fn(*args, **kwargs)
2025-12-04T10:46:59.4775601Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/kernel/mm.py", line 525, in tuned_mm
2025-12-04T10:46:59.4776313Z     return autotune_select_algorithm(
2025-12-04T10:46:59.4777127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 4298, in autotune_select_algorithm
2025-12-04T10:46:59.4777951Z     return cache(*args, **kwargs)
2025-12-04T10:46:59.4778643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2777, in __call__
2025-12-04T10:46:59.4779530Z     raise self.create_no_valid_choices(name, "No choices exist for backend.")
2025-12-04T10:46:59.4781326Z torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4782818Z   target: aten.mm.default
2025-12-04T10:46:59.4783126Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4783735Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4784337Z   ))
2025-12-04T10:46:59.4784563Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4785150Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4785746Z   ))
2025-12-04T10:46:59.4785861Z 
2025-12-04T10:46:59.4786570Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4787401Z 
2025-12-04T10:46:59.4787406Z 
2025-12-04T10:46:59.4787616Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4788562Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4789384Z 
2025-12-04T10:46:59.4789648Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4790268Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4790720Z frames [('total', 1)]
2025-12-04T10:46:59.4791008Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4791672Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4792406Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4792920Z graph_break []
2025-12-04T10:46:59.4793192Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4793648Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4794093Z frames [('total', 1)]
2025-12-04T10:46:59.4794381Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4794802Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4795475Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4796044Z graph_break []
2025-12-04T10:46:59.4796313Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4796751Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:46:59.4797207Z frames [('total', 1)]
2025-12-04T10:46:59.4797494Z stats [('calls_captured', 2)]
2025-12-04T10:46:59.4797924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:46:59.4798597Z inductor [('fxgraph_cache_miss', 1), ('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1)]
2025-12-04T10:46:59.4799163Z graph_break []
2025-12-04T10:46:59.4799434Z aten_mm_info [('aten.mm_16_32_24', 1)]
2025-12-04T10:46:59.4800427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml -
2025-12-04T10:46:59.4801663Z =========================== short test summary info ============================
2025-12-04T10:46:59.4803835Z FAILED [0.0875s] inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm - torch._inductor.exc.InductorError: LoweringException: NoValidChoicesError: No choices to select. Provided reason: No choices exist for backend. please consider adding ATEN into max_autotune_gemm_backends config (defined in torch/_inductor/config.py) to allow at least one choice. 
2025-12-04T10:46:59.4805856Z   target: aten.mm.default
2025-12-04T10:46:59.4806179Z   args[0]: TensorBox(StorageBox(
2025-12-04T10:46:59.4806768Z     InputBuffer(name='arg1_1', layout=FixedLayout('cuda:0', torch.float32, size=[16, 24], stride=[24, 1]))
2025-12-04T10:46:59.4807366Z   ))
2025-12-04T10:46:59.4807604Z   args[1]: TensorBox(StorageBox(
2025-12-04T10:46:59.4808176Z     InputBuffer(name='arg2_1', layout=FixedLayout('cuda:0', torch.float32, size=[24, 32], stride=[32, 1]))
2025-12-04T10:46:59.4808772Z   ))
2025-12-04T10:46:59.4808895Z 
2025-12-04T10:46:59.4809604Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:46:59.4810431Z 
2025-12-04T10:46:59.4810435Z 
2025-12-04T10:46:59.4810659Z To execute this test, run the following from the base repo dir:
2025-12-04T10:46:59.4811586Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_pattern_matcher.py TestPatternMatcher.test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4812329Z 
2025-12-04T10:46:59.4812594Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:46:59.4813178Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:46:59.4813691Z ================== 1 failed, 51 deselected, 2 rerun in 4.46s ===================
2025-12-04T10:46:59.4814114Z Got exit code 1
2025-12-04T10:46:59.4814945Z FAILED CONSISTENTLY: test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm
2025-12-04T10:46:59.4815998Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:46:59.4816960Z W1204 10:46:18.869000 78473 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4818116Z Test results will be stored in test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml
2025-12-04T10:46:59.4819108Z ============================= test session starts ==============================
2025-12-04T10:46:59.4819764Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:46:59.4820340Z cachedir: .pytest_cache
2025-12-04T10:46:59.4821037Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:46:59.4821813Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:46:59.4822161Z configfile: pytest.ini
2025-12-04T10:46:59.4822863Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:46:59.4823747Z collecting ... collected 52 items / 32 deselected / 20 selected
2025-12-04T10:46:59.4824244Z stepcurrent: skipping 32 already run items.
2025-12-04T10:46:59.4824616Z Running 20 items in this shard
2025-12-04T10:46:59.4824837Z 
2025-12-04T10:46:59.4825246Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_convert PASSED [1.8224s] [  5%]
2025-12-04T10:46:59.4826189Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_cumsum PASSED [7.4792s] [ 10%]
2025-12-04T10:46:59.4827149Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair PASSED [0.0165s] [ 15%]
2025-12-04T10:46:59.4828135Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_permute_pair_3d PASSED [0.0142s] [ 20%]
2025-12-04T10:46:59.4829118Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair PASSED [0.0214s] [ 25%]
2025-12-04T10:46:59.4830145Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_pointless_view_pair_dynamic_shapes PASSED [0.1950s] [ 30%]
2025-12-04T10:46:59.4831240Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_noop_pass_with_remove_passes PASSED [0.2867s] [ 35%]
2025-12-04T10:46:59.4832272Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_remove_pointless_clones PASSED [0.1647s] [ 40%]
2025-12-04T10:46:59.4833236Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_replace_mul_zero PASSED [0.1087s] [ 45%]
2025-12-04T10:46:59.4834158Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_scaled_softmax PASSED [10.5313s] [ 50%]
2025-12-04T10:46:59.4835146Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_serialized_patterns_up_to_date PASSED [10.1510s] [ 55%]
2025-12-04T10:46:59.4836145Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_splitwithsizes_cat PASSED [1.4169s] [ 60%]
2025-12-04T10:46:59.4837118Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_stable_topological_sort PASSED [0.0040s] [ 65%]
2025-12-04T10:46:59.4838141Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case0 PASSED [0.6030s] [ 70%]
2025-12-04T10:46:59.4839204Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case1 PASSED [0.5920s] [ 75%]
2025-12-04T10:46:59.4840247Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_successful_partial_reuse_case2 PASSED [0.6164s] [ 80%]
2025-12-04T10:46:59.4841275Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_symint_pattern_matching PASSED [0.8794s] [ 85%]
2025-12-04T10:46:59.4842769Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unfuse_bias_addmm W1204 10:46:55.000000 78473 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:46:59.4843924Z PASSED [2.1539s] [ 90%]
2025-12-04T10:46:59.4844570Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case0 PASSED [0.4964s] [ 95%]
2025-12-04T10:46:59.4845656Z inductor/test_pattern_matcher.py::TestPatternMatcher::test_unsuccessful_partial_reuse_case1 PASSED [0.6324s] [100%]
2025-12-04T10:46:59.4846262Z 
2025-12-04T10:46:59.4847058Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml -
2025-12-04T10:46:59.4848204Z ====================== 20 passed, 32 deselected in 38.24s ======================
2025-12-04T10:46:59.4849135Z The following tests failed consistently: ['test/inductor/test_pattern_matcher.py::TestPatternMatcher::test_original_aten_preserved_split_addmm']
2025-12-04T10:46:59.4849900Z 
2025-12-04T10:46:59.4850475Z FINISHED PRINTING LOG FILE of inductor/test_pattern_matcher 1/1 (test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log)
2025-12-04T10:46:59.4851202Z 
2025-12-04T10:46:59.4851565Z Finished inductor/test_pattern_matcher 1/1 ... [2025-12-04 10:46:59.424124][5577.034031169], took 1.98min
2025-12-04T10:46:59.4852929Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml
2025-12-04T10:46:59.4995519Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml
2025-12-04T10:46:59.5298545Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml
2025-12-04T10:46:59.5630593Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml
2025-12-04T10:46:59.8902850Z Uploading logs for 57119749427 to S3
2025-12-04T10:46:59.9364511Z Uploading artifacts took 0.35 seconds
2025-12-04T10:46:59.9364919Z inductor/test_pattern_matcher 1/1 failed!
2025-12-04T10:46:59.9369553Z Running inductor/test_cuda_repro 1/1 ... [2025-12-04 10:46:59.936798][5577.546706283]
2025-12-04T10:46:59.9370110Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:46:59.9374583Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_repro.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:46:59.937243]
2025-12-04T10:52:00.8857617Z 
2025-12-04T10:52:00.8858608Z PRINTING LOG FILE of inductor/test_cuda_repro 1/1 (test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log)
2025-12-04T10:52:00.8859976Z W1204 10:47:08.872000 79656 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.8861656Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml
2025-12-04T10:52:00.8862834Z ============================= test session starts ==============================
2025-12-04T10:52:00.8863715Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:00.8864663Z cachedir: .pytest_cache
2025-12-04T10:52:00.8865446Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:00.8866611Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:00.8867111Z configfile: pytest.ini
2025-12-04T10:52:00.8867983Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:00.8868961Z collecting ... collected 96 items
2025-12-04T10:52:00.8869615Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:52:00.8921228Z Running 96 items in this shard: test/inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling, test/inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1, test/inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248, test/inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16, test/inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_backward_context, test/inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense, test/inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue, test/inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel, test/inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms, test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses, test/inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding, test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned, test/inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain, test/inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts, test/inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic, test/inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants, test/inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu, test/inductor/test_cuda_repro.py::CudaReproTests::test_full_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_identity_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph, test/inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask, test/inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune, test/inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue100806, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103461, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue103481, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue104759, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input, test/inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924, test/inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input, test/inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size, test/inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd, test/inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor, test/inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile, test/inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback, test/inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor, test/inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes, test/inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op, test/inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices, test/inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one, test/inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device, test/inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion, test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile, test/inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch, test/inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order, test/inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load, test/inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index, test/inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward, test/inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16, test/inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address, test/inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims, test/inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue, test/inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last, test/inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed, test/inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent, test/inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding, test/inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy, test/inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop, test/inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs, test/inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328, test/inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro
2025-12-04T10:52:00.8960292Z 
2025-12-04T10:52:00.8960625Z inductor/test_cuda_repro.py::CudaReproTests::test_3d_tiling PASSED [3.2406s] [  1%]
2025-12-04T10:52:00.8961784Z inductor/test_cuda_repro.py::CudaReproTests::test_accuracy_issue1 W1204 10:47:14.743000 79656 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:00.8963237Z W1204 10:47:15.243000 79656 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.8963917Z PASSED [2.7562s] [  2%]
2025-12-04T10:52:00.8964577Z inductor/test_cuda_repro.py::CudaReproTests::test_adaptive_avg_pool3d_issue_157248 PASSED [3.1706s] [  3%]
2025-12-04T10:52:00.8965751Z inductor/test_cuda_repro.py::CudaReproTests::test_atomic_add_bfloat16 SKIPPED [0.0003s] (bfloat16 atomic add is only supported in fbcode today #97016) [  4%]
2025-12-04T10:52:00.8966880Z inductor/test_cuda_repro.py::CudaReproTests::test_autotune_inplace_kernel PASSED [0.1097s] [  5%]
2025-12-04T10:52:00.8967734Z inductor/test_cuda_repro.py::CudaReproTests::test_backward_context PASSED [0.5631s] [  6%]
2025-12-04T10:52:00.8968597Z inductor/test_cuda_repro.py::CudaReproTests::test_bool_emulate_low_precision PASSED [0.5244s] [  7%]
2025-12-04T10:52:00.8969486Z inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_dynamic_dense PASSED [0.9228s] [  8%]
2025-12-04T10:52:00.8970635Z inductor/test_cuda_repro.py::CudaReproTests::test_bucketize_epilogue SKIPPED [0.0003s] (Skipping triton backend only since not big GPU (not enough SM)) [  9%]
2025-12-04T10:52:00.8971779Z inductor/test_cuda_repro.py::CudaReproTests::test_cat_int8_one_kernel PASSED [0.9067s] [ 10%]
2025-12-04T10:52:00.8972576Z inductor/test_cuda_repro.py::CudaReproTests::test_cpu_index PASSED [0.9723s] [ 11%]
2025-12-04T10:52:00.8973395Z inductor/test_cuda_repro.py::CudaReproTests::test_deterministic_algorithms PASSED [0.5192s] [ 12%]
2025-12-04T10:52:00.8974393Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses ('RERUN', {'yellow': True}) [1.3094s] [ 13%]
2025-12-04T10:52:00.8975530Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck!
2025-12-04T10:52:00.8976277Z FileCheck checks:
2025-12-04T10:52:00.8976549Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.8976827Z ('RERUN', {'yellow': True}) [1.1625s] [ 13%]
2025-12-04T10:52:00.8977589Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck!
2025-12-04T10:52:00.8978334Z FileCheck checks:
2025-12-04T10:52:00.8978581Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.8978960Z FAILED [1.1610s] [ 13%]You have not run this instance of FileCheck!
2025-12-04T10:52:00.8979402Z FileCheck checks:
2025-12-04T10:52:00.8979649Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.8979818Z 
2025-12-04T10:52:00.8979823Z 
2025-12-04T10:52:00.8979965Z ==================================== RERUNS ====================================
2025-12-04T10:52:00.8980524Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.8981048Z Traceback (most recent call last):
2025-12-04T10:52:00.8981780Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.8982568Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.8982962Z IndexError: list index out of range
2025-12-04T10:52:00.8983193Z 
2025-12-04T10:52:00.8983404Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.8984264Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.8984917Z 
2025-12-04T10:52:00.8985179Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.8985801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.8986257Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.8987603Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.8988065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.8988771Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.8989338Z graph_break []
2025-12-04T10:52:00.8989623Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.8990097Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.8991158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.8992225Z   warnings.warn(
2025-12-04T10:52:00.8992656Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.8993186Z Traceback (most recent call last):
2025-12-04T10:52:00.8993919Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.8994707Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.8995105Z IndexError: list index out of range
2025-12-04T10:52:00.8995334Z 
2025-12-04T10:52:00.8995543Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.8996396Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.8997042Z 
2025-12-04T10:52:00.8997300Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.8997920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.8998375Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.8998705Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.8999138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.8999818Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9000393Z graph_break []
2025-12-04T10:52:00.9000678Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9001371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9002524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9003478Z   warnings.warn(
2025-12-04T10:52:00.9003854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9004315Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9004654Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9005092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9005787Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9006354Z graph_break []
2025-12-04T10:52:00.9006646Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9007115Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9008169Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9009114Z   warnings.warn(
2025-12-04T10:52:00.9009419Z =================================== FAILURES ===================================
2025-12-04T10:52:00.9009973Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9010483Z Traceback (most recent call last):
2025-12-04T10:52:00.9011224Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9012009Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9012388Z IndexError: list index out of range
2025-12-04T10:52:00.9012632Z 
2025-12-04T10:52:00.9013000Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9013867Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9014506Z 
2025-12-04T10:52:00.9014787Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9015398Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9015869Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9016302Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9016720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9017416Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9017992Z graph_break []
2025-12-04T10:52:00.9018281Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9018747Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9019824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9020774Z   warnings.warn(
2025-12-04T10:52:00.9021139Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9021605Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9021937Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9022373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9023049Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9023621Z graph_break []
2025-12-04T10:52:00.9023904Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9024362Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9025426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9026365Z   warnings.warn(
2025-12-04T10:52:00.9026733Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9027184Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9027511Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9027935Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9028615Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9029186Z graph_break []
2025-12-04T10:52:00.9029475Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9029932Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9031002Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9031950Z   warnings.warn(
2025-12-04T10:52:00.9032818Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml -
2025-12-04T10:52:00.9033817Z =========================== short test summary info ============================
2025-12-04T10:52:00.9034677Z FAILED [1.1610s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range
2025-12-04T10:52:00.9035365Z 
2025-12-04T10:52:00.9035578Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9036433Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9037073Z 
2025-12-04T10:52:00.9037337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9037988Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:00.9038511Z ============== 1 failed, 10 passed, 2 skipped, 2 rerun in 17.38s ===============
2025-12-04T10:52:00.9038956Z Got exit code 1
2025-12-04T10:52:00.9039207Z Retrying single test...
2025-12-04T10:52:00.9039830Z W1204 10:47:39.481000 80332 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9040935Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml
2025-12-04T10:52:00.9041817Z ============================= test session starts ==============================
2025-12-04T10:52:00.9042542Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:00.9043136Z cachedir: .pytest_cache
2025-12-04T10:52:00.9043832Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:00.9044594Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:00.9044949Z configfile: pytest.ini
2025-12-04T10:52:00.9045665Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:00.9046525Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:00.9047463Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9048300Z Running 1 items in this shard
2025-12-04T10:52:00.9048505Z 
2025-12-04T10:52:00.9049352Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses W1204 10:47:44.243000 80332 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9050459Z You have not run this instance of FileCheck!
2025-12-04T10:52:00.9050813Z FileCheck checks:
2025-12-04T10:52:00.9051078Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9051366Z ('RERUN', {'yellow': True}) [4.7791s] [100%]
2025-12-04T10:52:00.9052101Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses ('RERUN', {'yellow': True}) [1.1572s] [100%]
2025-12-04T10:52:00.9053243Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck!
2025-12-04T10:52:00.9053985Z FileCheck checks:
2025-12-04T10:52:00.9054235Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9054496Z FAILED [1.1580s] [100%]
2025-12-04T10:52:00.9054668Z 
2025-12-04T10:52:00.9054820Z ==================================== RERUNS ====================================
2025-12-04T10:52:00.9055373Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9055884Z Traceback (most recent call last):
2025-12-04T10:52:00.9056622Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9057410Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9057790Z IndexError: list index out of range
2025-12-04T10:52:00.9058033Z 
2025-12-04T10:52:00.9058243Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9059093Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9059729Z 
2025-12-04T10:52:00.9060000Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9060610Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9061076Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9061405Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9061942Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9062633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9063175Z graph_break []
2025-12-04T10:52:00.9063468Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9063932Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9065007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9065962Z   warnings.warn(
2025-12-04T10:52:00.9066375Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9066970Z Traceback (most recent call last):
2025-12-04T10:52:00.9067706Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9068488Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9068868Z IndexError: list index out of range
2025-12-04T10:52:00.9069110Z 
2025-12-04T10:52:00.9069325Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9070175Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9070811Z 
2025-12-04T10:52:00.9071083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9071685Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9072149Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9072481Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9073023Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9073732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9074185Z graph_break []
2025-12-04T10:52:00.9074479Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9074939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9076015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9076965Z   warnings.warn(
2025-12-04T10:52:00.9077326Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9077795Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9078130Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9078560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9079246Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9079824Z graph_break []
2025-12-04T10:52:00.9080114Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9080576Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9081654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9082679Z   warnings.warn(
2025-12-04T10:52:00.9082990Z =================================== FAILURES ===================================
2025-12-04T10:52:00.9083535Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9084062Z Traceback (most recent call last):
2025-12-04T10:52:00.9084806Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9085590Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9085985Z IndexError: list index out of range
2025-12-04T10:52:00.9086212Z 
2025-12-04T10:52:00.9086433Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9087281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9087922Z 
2025-12-04T10:52:00.9088309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9088927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9089397Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9089714Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9090264Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9090956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9091530Z graph_break []
2025-12-04T10:52:00.9091801Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9092280Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9093353Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9094290Z   warnings.warn(
2025-12-04T10:52:00.9094667Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9095132Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9095465Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9095881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9096570Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9097144Z graph_break []
2025-12-04T10:52:00.9097421Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9097891Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9098957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9099903Z   warnings.warn(
2025-12-04T10:52:00.9100266Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9100731Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9101232Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9101653Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9102353Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9102934Z graph_break []
2025-12-04T10:52:00.9103209Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9103683Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9104749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9105695Z   warnings.warn(
2025-12-04T10:52:00.9106549Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml -
2025-12-04T10:52:00.9107561Z =========================== short test summary info ============================
2025-12-04T10:52:00.9108417Z FAILED [1.1580s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range
2025-12-04T10:52:00.9109086Z 
2025-12-04T10:52:00.9109312Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9110145Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9110798Z 
2025-12-04T10:52:00.9111058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9111635Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:00.9112148Z ================== 1 failed, 95 deselected, 2 rerun in 7.13s ===================
2025-12-04T10:52:00.9112615Z You have not run this instance of FileCheck!
2025-12-04T10:52:00.9112983Z FileCheck checks:
2025-12-04T10:52:00.9113376Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9113626Z Got exit code 1
2025-12-04T10:52:00.9113888Z Retrying single test...
2025-12-04T10:52:00.9114510Z W1204 10:47:59.298000 80502 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9115593Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml
2025-12-04T10:52:00.9116419Z ============================= test session starts ==============================
2025-12-04T10:52:00.9117148Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:00.9117736Z cachedir: .pytest_cache
2025-12-04T10:52:00.9118412Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:00.9119173Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:00.9119519Z configfile: pytest.ini
2025-12-04T10:52:00.9120236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:00.9121097Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:00.9122031Z stepcurrent: skipping 12 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9122935Z Running 1 items in this shard
2025-12-04T10:52:00.9123142Z 
2025-12-04T10:52:00.9124005Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses W1204 10:48:04.081000 80502 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9125095Z ('RERUN', {'yellow': True}) [4.7899s] [100%]
2025-12-04T10:52:00.9125864Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck!
2025-12-04T10:52:00.9126612Z FileCheck checks:
2025-12-04T10:52:00.9126866Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9127155Z ('RERUN', {'yellow': True}) [1.1617s] [100%]
2025-12-04T10:52:00.9127920Z inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses You have not run this instance of FileCheck!
2025-12-04T10:52:00.9128661Z FileCheck checks:
2025-12-04T10:52:00.9128908Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9129167Z FAILED [1.1652s] [100%]
2025-12-04T10:52:00.9129338Z 
2025-12-04T10:52:00.9129490Z ==================================== RERUNS ====================================
2025-12-04T10:52:00.9130031Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9130554Z Traceback (most recent call last):
2025-12-04T10:52:00.9131298Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9132074Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9132475Z IndexError: list index out of range
2025-12-04T10:52:00.9132719Z 
2025-12-04T10:52:00.9132930Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9133788Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9134430Z 
2025-12-04T10:52:00.9134691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9135316Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9135792Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9136130Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9136677Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9137376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9137830Z graph_break []
2025-12-04T10:52:00.9138099Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9138709Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9139788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9140742Z   warnings.warn(
2025-12-04T10:52:00.9141157Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9158834Z Traceback (most recent call last):
2025-12-04T10:52:00.9159890Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9160694Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9161105Z IndexError: list index out of range
2025-12-04T10:52:00.9161340Z 
2025-12-04T10:52:00.9161555Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9162508Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9163151Z 
2025-12-04T10:52:00.9163432Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9164067Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9164535Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9164868Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9165429Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9166118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9166575Z graph_break []
2025-12-04T10:52:00.9166860Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9167322Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9168394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9169325Z   warnings.warn(
2025-12-04T10:52:00.9169694Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9170142Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9170467Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9170881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9171554Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9172132Z graph_break []
2025-12-04T10:52:00.9172405Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9172861Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9173924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9174869Z   warnings.warn(
2025-12-04T10:52:00.9175176Z =================================== FAILURES ===================================
2025-12-04T10:52:00.9175714Z ______________ CudaReproTests.test_dont_inplace_disjoint_accesses ______________
2025-12-04T10:52:00.9176241Z Traceback (most recent call last):
2025-12-04T10:52:00.9176984Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 1745, in test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9177759Z     FileCheck().check_not("in_out").run(code[0])
2025-12-04T10:52:00.9178151Z IndexError: list index out of range
2025-12-04T10:52:00.9178391Z 
2025-12-04T10:52:00.9178602Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9179455Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9180091Z 
2025-12-04T10:52:00.9180351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9181058Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9181530Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9181865Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9182407Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9183106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9183563Z graph_break []
2025-12-04T10:52:00.9183837Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9184377Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9185453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9186405Z   warnings.warn(
2025-12-04T10:52:00.9186768Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9187232Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9187563Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9187979Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9188672Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9189243Z graph_break []
2025-12-04T10:52:00.9189516Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9189989Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9191063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9192010Z   warnings.warn(
2025-12-04T10:52:00.9192372Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9192833Z frames [('total', 2), ('ok', 2)]
2025-12-04T10:52:00.9193170Z stats [('calls_captured', 66)]
2025-12-04T10:52:00.9193585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T10:52:00.9194275Z inductor [('pattern_matcher_nodes', 13), ('pattern_matcher_count', 12), ('fxgraph_cache_miss', 1)]
2025-12-04T10:52:00.9194851Z graph_break []
2025-12-04T10:52:00.9195136Z aten_mm_info [('aten.mm_32768_2048_2048', 3)]
2025-12-04T10:52:00.9195594Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9196665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T10:52:00.9197619Z   warnings.warn(
2025-12-04T10:52:00.9198476Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml -
2025-12-04T10:52:00.9199490Z =========================== short test summary info ============================
2025-12-04T10:52:00.9200357Z FAILED [1.1652s] inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses - IndexError: list index out of range
2025-12-04T10:52:00.9201313Z 
2025-12-04T10:52:00.9201547Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9202453Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9203105Z 
2025-12-04T10:52:00.9203368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9203960Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:00.9204476Z ================== 1 failed, 95 deselected, 2 rerun in 7.15s ===================
2025-12-04T10:52:00.9204943Z You have not run this instance of FileCheck!
2025-12-04T10:52:00.9205319Z FileCheck checks:
2025-12-04T10:52:00.9205584Z 	CHECK-NOT: in_out
2025-12-04T10:52:00.9205834Z Got exit code 1
2025-12-04T10:52:00.9206622Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses
2025-12-04T10:52:00.9207595Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:52:00.9208574Z W1204 10:48:19.136000 80672 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9209655Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml
2025-12-04T10:52:00.9210576Z ============================= test session starts ==============================
2025-12-04T10:52:00.9211225Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:00.9211820Z cachedir: .pytest_cache
2025-12-04T10:52:00.9212505Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:00.9213281Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:00.9213630Z configfile: pytest.ini
2025-12-04T10:52:00.9214386Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:00.9215351Z collecting ... collected 96 items / 13 deselected / 83 selected
2025-12-04T10:52:00.9215843Z stepcurrent: skipping 13 already run items.
2025-12-04T10:52:00.9216225Z Running 83 items in this shard
2025-12-04T10:52:00.9216438Z 
2025-12-04T10:52:00.9216800Z inductor/test_cuda_repro.py::CudaReproTests::test_dtype_factory_issue PASSED [2.1412s] [  1%]
2025-12-04T10:52:00.9217703Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_persistent_reductions PASSED [1.3498s] [  2%]
2025-12-04T10:52:00.9218572Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_shapes PASSED [1.2510s] [  3%]
2025-12-04T10:52:00.9219422Z inductor/test_cuda_repro.py::CudaReproTests::test_dynamic_to_static_cudagraphs PASSED [0.7824s] [  4%]
2025-12-04T10:52:00.9220317Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding PASSED [0.2922s] [  6%]
2025-12-04T10:52:00.9221328Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned ('RERUN', {'yellow': True}) [0.0520s] [  7%]
2025-12-04T10:52:00.9222468Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned ('RERUN', {'yellow': True}) [0.0241s] [  7%]
2025-12-04T10:52:00.9223510Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned FAILED [0.0230s] [  7%]
2025-12-04T10:52:00.9224075Z 
2025-12-04T10:52:00.9224216Z ==================================== RERUNS ====================================
2025-12-04T10:52:00.9224768Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:00.9225299Z Traceback (most recent call last):
2025-12-04T10:52:00.9226038Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9226850Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:00.9227616Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:00.9228333Z     result = fn(*args, **kwargs)
2025-12-04T10:52:00.9229010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:00.9229734Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9230406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:00.9231117Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:00.9231836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:00.9232544Z     result = self._inner_convert(
2025-12-04T10:52:00.9233305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:00.9233986Z     result = _compile(
2025-12-04T10:52:00.9234626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:00.9235461Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9236282Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:00.9237066Z     return function(*args, **kwargs)
2025-12-04T10:52:00.9237795Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:00.9238565Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9239319Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:00.9240071Z     dynamo_output = compile_frame(
2025-12-04T10:52:00.9240786Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:00.9241638Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:00.9242650Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:00.9243579Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:00.9244388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:00.9245085Z     tracer_output = trace_frame(
2025-12-04T10:52:00.9245734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:00.9246411Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9247087Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:00.9247772Z     run_tracer()
2025-12-04T10:52:00.9248389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:00.9249083Z     tracer.run()
2025-12-04T10:52:00.9249675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:00.9250365Z     while self.step():
2025-12-04T10:52:00.9250997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:00.9251740Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:00.9252473Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:00.9253182Z     return inner_fn(self, inst)
2025-12-04T10:52:00.9253916Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:00.9254677Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:00.9255426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:00.9256320Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:00.9257228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:00.9258055Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:00.9258858Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:00.9259613Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:00.9260362Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:00.9261287Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:00.9262169Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:00.9262949Z     out = _wrap_fx_proxy(
2025-12-04T10:52:00.9263643Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:00.9264547Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:00.9265454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:00.9266308Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:00.9267144Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:00.9267853Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:00.9268565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:00.9269267Z     return fn()
2025-12-04T10:52:00.9269820Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:00.9270570Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:00.9271311Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:00.9272183Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:00.9272927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:00.9273706Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9276180Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.')
2025-12-04T10:52:00.9278438Z 
2025-12-04T10:52:00.9278538Z from user code:
2025-12-04T10:52:00.9279042Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:00.9279632Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:00.9279894Z 
2025-12-04T10:52:00.9280598Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:00.9281435Z 
2025-12-04T10:52:00.9281440Z 
2025-12-04T10:52:00.9281656Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9282591Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9283246Z 
2025-12-04T10:52:00.9283523Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9284134Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9284601Z frames [('total', 1)]
2025-12-04T10:52:00.9284999Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9286505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9288007Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9290009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9291923Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9293410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9294964Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9296443Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9297918Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9299397Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9300998Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9302497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9303989Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9304611Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:00.9305128Z Traceback (most recent call last):
2025-12-04T10:52:00.9305882Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9306685Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:00.9307439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:00.9308142Z     result = fn(*args, **kwargs)
2025-12-04T10:52:00.9308842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:00.9309565Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9310232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:00.9310946Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:00.9311669Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:00.9312373Z     result = self._inner_convert(
2025-12-04T10:52:00.9313035Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:00.9313730Z     result = _compile(
2025-12-04T10:52:00.9314358Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:00.9315184Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9316005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:00.9316722Z     return function(*args, **kwargs)
2025-12-04T10:52:00.9317446Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:00.9318217Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9319103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:00.9319856Z     dynamo_output = compile_frame(
2025-12-04T10:52:00.9320570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:00.9321405Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:00.9322419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:00.9323524Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:00.9324327Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:00.9325024Z     tracer_output = trace_frame(
2025-12-04T10:52:00.9325680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:00.9326361Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9327021Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:00.9327719Z     run_tracer()
2025-12-04T10:52:00.9328336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:00.9329031Z     tracer.run()
2025-12-04T10:52:00.9329629Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:00.9330318Z     while self.step():
2025-12-04T10:52:00.9330949Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:00.9331678Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:00.9332419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:00.9333130Z     return inner_fn(self, inst)
2025-12-04T10:52:00.9333859Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:00.9334623Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:00.9335372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:00.9336261Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:00.9337177Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:00.9338006Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:00.9338809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:00.9339567Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:00.9340300Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:00.9341156Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:00.9342028Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:00.9342805Z     out = _wrap_fx_proxy(
2025-12-04T10:52:00.9343501Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:00.9344409Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:00.9345247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:00.9346100Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:00.9347018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:00.9347723Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:00.9348421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:00.9349112Z     return fn()
2025-12-04T10:52:00.9349681Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:00.9350488Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:00.9351231Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:00.9351977Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:00.9352721Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:00.9353492Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9355965Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.')
2025-12-04T10:52:00.9358204Z 
2025-12-04T10:52:00.9358316Z from user code:
2025-12-04T10:52:00.9358932Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:00.9359538Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:00.9359790Z 
2025-12-04T10:52:00.9360512Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:00.9361340Z 
2025-12-04T10:52:00.9361345Z 
2025-12-04T10:52:00.9361570Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9362494Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9363161Z 
2025-12-04T10:52:00.9363423Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9364047Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9364503Z frames [('total', 1)]
2025-12-04T10:52:00.9364882Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9366405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9367909Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9369835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9371739Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9373220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9374787Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9376270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9377747Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9379215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9380757Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9382252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9383741Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9384290Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9384751Z frames [('total', 1)]
2025-12-04T10:52:00.9385147Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9386639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9388122Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9390055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9391961Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9393442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9394923Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9396369Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9397841Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9399321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9400800Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9402485Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9403984Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9404473Z =================================== FAILURES ===================================
2025-12-04T10:52:00.9405032Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:00.9405551Z Traceback (most recent call last):
2025-12-04T10:52:00.9406416Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9407234Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:00.9407993Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:00.9408696Z     result = fn(*args, **kwargs)
2025-12-04T10:52:00.9409394Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:00.9410212Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9410870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:00.9411597Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:00.9412323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:00.9413028Z     result = self._inner_convert(
2025-12-04T10:52:00.9413697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:00.9414386Z     result = _compile(
2025-12-04T10:52:00.9415018Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:00.9415831Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9416663Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:00.9417379Z     return function(*args, **kwargs)
2025-12-04T10:52:00.9418097Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:00.9418856Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9419621Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:00.9420372Z     dynamo_output = compile_frame(
2025-12-04T10:52:00.9421076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:00.9421929Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:00.9422885Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:00.9423822Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:00.9424617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:00.9425333Z     tracer_output = trace_frame(
2025-12-04T10:52:00.9425982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:00.9426657Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9427322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:00.9428020Z     run_tracer()
2025-12-04T10:52:00.9428635Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:00.9429316Z     tracer.run()
2025-12-04T10:52:00.9429922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:00.9430608Z     while self.step():
2025-12-04T10:52:00.9431246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:00.9431975Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:00.9432717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:00.9433430Z     return inner_fn(self, inst)
2025-12-04T10:52:00.9434238Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:00.9435017Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:00.9435772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:00.9436671Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:00.9437567Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:00.9438466Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:00.9439266Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:00.9440017Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:00.9440754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:00.9441606Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:00.9442545Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:00.9443307Z     out = _wrap_fx_proxy(
2025-12-04T10:52:00.9444015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:00.9444920Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:00.9445766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:00.9446613Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:00.9447469Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:00.9448172Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:00.9448875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:00.9449565Z     return fn()
2025-12-04T10:52:00.9450131Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:00.9450881Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:00.9451610Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:00.9452363Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:00.9453103Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:00.9453864Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9456332Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.')
2025-12-04T10:52:00.9458583Z 
2025-12-04T10:52:00.9458682Z from user code:
2025-12-04T10:52:00.9459197Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:00.9459921Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:00.9460173Z 
2025-12-04T10:52:00.9460890Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:00.9461713Z 
2025-12-04T10:52:00.9461808Z 
2025-12-04T10:52:00.9462025Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9462909Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9463580Z 
2025-12-04T10:52:00.9463845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9464471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9464979Z frames [('total', 1)]
2025-12-04T10:52:00.9465373Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9466880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9468402Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9470334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9472236Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9473724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9475204Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9476732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9478207Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9479675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9481157Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9482714Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9484212Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9484765Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9485224Z frames [('total', 1)]
2025-12-04T10:52:00.9485617Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9487108Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9488608Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9490538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9492516Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9494000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9495470Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9496943Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9498475Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9499956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9501596Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9503079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9504564Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9505126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9505583Z frames [('total', 1)]
2025-12-04T10:52:00.9506460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml -
2025-12-04T10:52:00.9507475Z =========================== short test summary info ============================
2025-12-04T10:52:00.9510379Z FAILED [0.0230s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.')
2025-12-04T10:52:00.9513095Z 
2025-12-04T10:52:00.9513194Z from user code:
2025-12-04T10:52:00.9513688Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:00.9514279Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:00.9514546Z 
2025-12-04T10:52:00.9515252Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:00.9516089Z 
2025-12-04T10:52:00.9516093Z 
2025-12-04T10:52:00.9516305Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9517173Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9517829Z 
2025-12-04T10:52:00.9518090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9518676Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:00.9519203Z ============= 1 failed, 5 passed, 13 deselected, 2 rerun in 5.96s ==============
2025-12-04T10:52:00.9519653Z Got exit code 1
2025-12-04T10:52:00.9519904Z Retrying single test...
2025-12-04T10:52:00.9520657Z W1204 10:48:40.163000 80955 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:00.9521762Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml
2025-12-04T10:52:00.9522647Z ============================= test session starts ==============================
2025-12-04T10:52:00.9523306Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:00.9523902Z cachedir: .pytest_cache
2025-12-04T10:52:00.9524605Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:00.9525453Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:00.9525805Z configfile: pytest.ini
2025-12-04T10:52:00.9526526Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:00.9527406Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:00.9528350Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9529199Z Running 1 items in this shard
2025-12-04T10:52:00.9529404Z 
2025-12-04T10:52:00.9530333Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:42.639934199 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:00.9531366Z 
2025-12-04T10:52:00.9531514Z ('RERUN', {'yellow': True}) [15.2599s] [100%]
2025-12-04T10:52:00.9532665Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:57.838425527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:00.9533709Z 
2025-12-04T10:52:00.9533838Z ('RERUN', {'yellow': True}) [0.0265s] [100%]
2025-12-04T10:52:00.9534997Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:48:57.863701046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:00.9536025Z 
2025-12-04T10:52:00.9536136Z FAILED [0.0229s] [100%]
2025-12-04T10:52:00.9536307Z 
2025-12-04T10:52:00.9536448Z ==================================== RERUNS ====================================
2025-12-04T10:52:00.9537005Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:00.9537535Z Traceback (most recent call last):
2025-12-04T10:52:00.9538282Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9539078Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:00.9539837Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:00.9540550Z     result = fn(*args, **kwargs)
2025-12-04T10:52:00.9541234Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:00.9541949Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9542612Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:00.9543338Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:00.9544038Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:00.9544744Z     result = self._inner_convert(
2025-12-04T10:52:00.9545417Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:00.9546100Z     result = _compile(
2025-12-04T10:52:00.9546720Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:00.9547622Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9548454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:00.9549153Z     return function(*args, **kwargs)
2025-12-04T10:52:00.9549872Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:00.9550649Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9551420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:00.9552213Z     dynamo_output = compile_frame(
2025-12-04T10:52:00.9552929Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:00.9553784Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:00.9554727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:00.9555668Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:00.9556464Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:00.9557176Z     tracer_output = trace_frame(
2025-12-04T10:52:00.9557812Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:00.9558502Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9559173Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:00.9559874Z     run_tracer()
2025-12-04T10:52:00.9560478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:00.9561170Z     tracer.run()
2025-12-04T10:52:00.9561783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:00.9562527Z     while self.step():
2025-12-04T10:52:00.9563168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:00.9563914Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:00.9564661Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:00.9565370Z     return inner_fn(self, inst)
2025-12-04T10:52:00.9566104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:00.9566885Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:00.9567626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:00.9568522Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:00.9569439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:00.9570275Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:00.9571067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:00.9571819Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:00.9572573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:00.9573439Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:00.9574299Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:00.9575073Z     out = _wrap_fx_proxy(
2025-12-04T10:52:00.9575857Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:00.9576753Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:00.9577589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:00.9578445Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:00.9579298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:00.9580047Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:00.9580752Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:00.9581455Z     return fn()
2025-12-04T10:52:00.9582010Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:00.9582764Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:00.9583502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:00.9584258Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:00.9584990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:00.9585758Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9684946Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:00.9783141Z 
2025-12-04T10:52:00.9783243Z from user code:
2025-12-04T10:52:00.9783738Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:00.9784336Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:00.9784601Z 
2025-12-04T10:52:00.9785302Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:00.9786152Z 
2025-12-04T10:52:00.9786156Z 
2025-12-04T10:52:00.9786375Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:00.9787373Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9788037Z 
2025-12-04T10:52:00.9788321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:00.9788930Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:00.9789389Z frames [('total', 1)]
2025-12-04T10:52:00.9789776Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:00.9791278Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:00.9792799Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9794863Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:00.9796789Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9798274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:00.9799744Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9801372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:00.9803019Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9804504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:00.9805966Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9807454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:00.9808951Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9810408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:00.9811864Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9812468Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:00.9813000Z Traceback (most recent call last):
2025-12-04T10:52:00.9813747Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:00.9814539Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:00.9815298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:00.9816012Z     result = fn(*args, **kwargs)
2025-12-04T10:52:00.9816705Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:00.9817412Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9818076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:00.9818811Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:00.9819533Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:00.9820230Z     result = self._inner_convert(
2025-12-04T10:52:00.9820912Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:00.9821601Z     result = _compile(
2025-12-04T10:52:00.9822222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:00.9823055Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9823894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:00.9824604Z     return function(*args, **kwargs)
2025-12-04T10:52:00.9825308Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:00.9826190Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:00.9826966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:00.9827699Z     dynamo_output = compile_frame(
2025-12-04T10:52:00.9828416Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:00.9829263Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:00.9830228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:00.9831302Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:00.9832116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:00.9832824Z     tracer_output = trace_frame(
2025-12-04T10:52:00.9833482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:00.9834161Z     return fn(*args, **kwargs)
2025-12-04T10:52:00.9834839Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:00.9835526Z     run_tracer()
2025-12-04T10:52:00.9836146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:00.9836846Z     tracer.run()
2025-12-04T10:52:00.9837450Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:00.9838146Z     while self.step():
2025-12-04T10:52:00.9838791Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:00.9839538Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:00.9840275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:00.9840984Z     return inner_fn(self, inst)
2025-12-04T10:52:00.9841717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:00.9842565Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:00.9843304Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:00.9844201Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:00.9845127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:00.9845948Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:00.9846760Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:00.9847513Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:00.9848258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:00.9849100Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:00.9849979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:00.9850755Z     out = _wrap_fx_proxy(
2025-12-04T10:52:00.9851467Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:00.9852361Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:00.9853204Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:00.9854061Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:00.9855005Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:00.9855710Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:00.9856420Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:00.9857131Z     return fn()
2025-12-04T10:52:00.9857690Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:00.9858505Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:00.9859248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:00.9859999Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:00.9860727Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:00.9861496Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:00.9960528Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0058335Z 
2025-12-04T10:52:01.0058438Z from user code:
2025-12-04T10:52:01.0058935Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0059543Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0059796Z 
2025-12-04T10:52:01.0060508Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0061346Z 
2025-12-04T10:52:01.0061350Z 
2025-12-04T10:52:01.0061568Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0062439Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0063092Z 
2025-12-04T10:52:01.0063368Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0063982Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0064441Z frames [('total', 1)]
2025-12-04T10:52:01.0064831Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0066348Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0067838Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0069777Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0071695Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0073184Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0074669Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0076248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0077737Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0079221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0080782Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0082315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0083811Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0085262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:01.0086712Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0087257Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0087722Z frames [('total', 1)]
2025-12-04T10:52:01.0088113Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0089616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0091098Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0093020Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0094945Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0096430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0097906Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0099362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0100969Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0102456Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0103942Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0105422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0106915Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0107508Z =================================== FAILURES ===================================
2025-12-04T10:52:01.0108070Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:01.0108587Z Traceback (most recent call last):
2025-12-04T10:52:01.0109335Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0110142Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:01.0110980Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:01.0111684Z     result = fn(*args, **kwargs)
2025-12-04T10:52:01.0112379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:01.0113100Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0113757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:01.0114490Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:01.0115206Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:01.0115913Z     result = self._inner_convert(
2025-12-04T10:52:01.0116578Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:01.0117265Z     result = _compile(
2025-12-04T10:52:01.0117902Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:01.0118717Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0119552Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:01.0120264Z     return function(*args, **kwargs)
2025-12-04T10:52:01.0120986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:01.0121750Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0122582Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:01.0123335Z     dynamo_output = compile_frame(
2025-12-04T10:52:01.0124056Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:01.0124897Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:01.0125851Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:01.0126791Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:01.0127584Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:01.0128296Z     tracer_output = trace_frame(
2025-12-04T10:52:01.0128944Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:01.0129621Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0130279Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:01.0130976Z     run_tracer()
2025-12-04T10:52:01.0131593Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:01.0132293Z     tracer.run()
2025-12-04T10:52:01.0132888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:01.0133572Z     while self.step():
2025-12-04T10:52:01.0134203Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:01.0135016Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:01.0135764Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:01.0136474Z     return inner_fn(self, inst)
2025-12-04T10:52:01.0137210Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:01.0137971Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:01.0138784Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:01.0139675Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:01.0140572Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:01.0141410Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:01.0142217Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:01.0142970Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:01.0143699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:01.0144562Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:01.0145439Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:01.0146218Z     out = _wrap_fx_proxy(
2025-12-04T10:52:01.0146911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:01.0147818Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:01.0148664Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:01.0149506Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:01.0150364Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:01.0151062Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:01.0151766Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:01.0152462Z     return fn()
2025-12-04T10:52:01.0153031Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:01.0153779Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:01.0154509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:01.0155260Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:01.0156008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:01.0156773Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0255887Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0354089Z 
2025-12-04T10:52:01.0354194Z from user code:
2025-12-04T10:52:01.0354697Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0355404Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0355675Z 
2025-12-04T10:52:01.0356376Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0357203Z 
2025-12-04T10:52:01.0357222Z 
2025-12-04T10:52:01.0357437Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0358316Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0359056Z 
2025-12-04T10:52:01.0359318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0359944Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0360510Z frames [('total', 1)]
2025-12-04T10:52:01.0360905Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0362464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0363968Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0366007Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0367924Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0369399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0370881Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0372354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0373824Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0375307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0376769Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0378263Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0379754Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0381201Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:01.0382641Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0383213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0383674Z frames [('total', 1)]
2025-12-04T10:52:01.0384058Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0385716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0387208Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0388789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0389078Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0390208Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0390433Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0391561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0391767Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0392917Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0393129Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0394291Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0394498Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0394728Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0394834Z frames [('total', 1)]
2025-12-04T10:52:01.0395541Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml -
2025-12-04T10:52:01.0395734Z =========================== short test summary info ============================
2025-12-04T10:52:01.0495547Z FAILED [0.0229s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0496033Z 
2025-12-04T10:52:01.0496138Z from user code:
2025-12-04T10:52:01.0496489Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0496623Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0496633Z 
2025-12-04T10:52:01.0497349Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0497354Z 
2025-12-04T10:52:01.0497359Z 
2025-12-04T10:52:01.0497575Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0498157Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0498178Z 
2025-12-04T10:52:01.0498444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0498626Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:01.0498839Z ================== 1 failed, 95 deselected, 2 rerun in 15.35s ==================
2025-12-04T10:52:01.0498937Z Got exit code 1
2025-12-04T10:52:01.0499042Z Retrying single test...
2025-12-04T10:52:01.0499552Z W1204 10:49:07.528000 81075 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.0500079Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml
2025-12-04T10:52:01.0500256Z ============================= test session starts ==============================
2025-12-04T10:52:01.0500607Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:01.0500714Z cachedir: .pytest_cache
2025-12-04T10:52:01.0501501Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:01.0501627Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:01.0501736Z configfile: pytest.ini
2025-12-04T10:52:01.0502288Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:01.0502507Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:01.0503150Z stepcurrent: skipping 18 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0503264Z Running 1 items in this shard
2025-12-04T10:52:01.0503269Z 
2025-12-04T10:52:01.0504189Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:09.019090885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.0504209Z 
2025-12-04T10:52:01.0504338Z ('RERUN', {'yellow': True}) [14.9005s] [100%]
2025-12-04T10:52:01.0505243Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:24.857973452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.0505249Z 
2025-12-04T10:52:01.0505392Z ('RERUN', {'yellow': True}) [0.0258s] [100%]
2025-12-04T10:52:01.0506292Z inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned [W1204 10:49:24.882090558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.0506296Z 
2025-12-04T10:52:01.0506408Z FAILED [0.0219s] [100%]
2025-12-04T10:52:01.0506412Z 
2025-12-04T10:52:01.0506555Z ==================================== RERUNS ====================================
2025-12-04T10:52:01.0506824Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:01.0506957Z Traceback (most recent call last):
2025-12-04T10:52:01.0507472Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0507644Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:01.0508113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:01.0508228Z     result = fn(*args, **kwargs)
2025-12-04T10:52:01.0508716Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:01.0508825Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0509277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:01.0509534Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:01.0509986Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:01.0510115Z     result = self._inner_convert(
2025-12-04T10:52:01.0510563Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:01.0510663Z     result = _compile(
2025-12-04T10:52:01.0511126Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:01.0511456Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0511910Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:01.0512036Z     return function(*args, **kwargs)
2025-12-04T10:52:01.0512519Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:01.0512684Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0513166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:01.0513281Z     dynamo_output = compile_frame(
2025-12-04T10:52:01.0513774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:01.0514005Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:01.0514607Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:01.0514815Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:01.0515271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:01.0515404Z     tracer_output = trace_frame(
2025-12-04T10:52:01.0515827Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:01.0515936Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0516407Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:01.0516501Z     run_tracer()
2025-12-04T10:52:01.0516969Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:01.0517071Z     tracer.run()
2025-12-04T10:52:01.0517512Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:01.0517627Z     while self.step():
2025-12-04T10:52:01.0518074Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:01.0518225Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:01.0518693Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:01.0518807Z     return inner_fn(self, inst)
2025-12-04T10:52:01.0519330Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:01.0519452Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:01.0519950Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:01.0520222Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:01.0520735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:01.0520926Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:01.0521475Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:01.0521594Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:01.0522175Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:01.0522391Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:01.0522918Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:01.0523101Z     out = _wrap_fx_proxy(
2025-12-04T10:52:01.0523608Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:01.0523876Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:01.0524316Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:01.0524589Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:01.0525042Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:01.0525159Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:01.0525645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:01.0525737Z     return fn()
2025-12-04T10:52:01.0526147Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:01.0526358Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:01.0526768Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:01.0526965Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:01.0527393Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:01.0527602Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0626263Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0626730Z 
2025-12-04T10:52:01.0626833Z from user code:
2025-12-04T10:52:01.0627172Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0627315Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0627320Z 
2025-12-04T10:52:01.0628024Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0628030Z 
2025-12-04T10:52:01.0628035Z 
2025-12-04T10:52:01.0628259Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0628784Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0628793Z 
2025-12-04T10:52:01.0629062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0629293Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0629396Z frames [('total', 1)]
2025-12-04T10:52:01.0629621Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0630843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0631057Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0632656Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0632928Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0634078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0634285Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0635418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0635624Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0636753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0636973Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0638117Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0638336Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0639432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:01.0639634Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0639923Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:01.0640044Z Traceback (most recent call last):
2025-12-04T10:52:01.0640566Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0640724Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:01.0641197Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:01.0641320Z     result = fn(*args, **kwargs)
2025-12-04T10:52:01.0641792Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:01.0641900Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0642424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:01.0654068Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:01.0654645Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:01.0654766Z     result = self._inner_convert(
2025-12-04T10:52:01.0655240Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:01.0655346Z     result = _compile(
2025-12-04T10:52:01.0655975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:01.0656217Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0656675Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:01.0656810Z     return function(*args, **kwargs)
2025-12-04T10:52:01.0657290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:01.0657515Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0658014Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:01.0658132Z     dynamo_output = compile_frame(
2025-12-04T10:52:01.0658626Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:01.0658859Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:01.0659451Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:01.0659676Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:01.0660137Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:01.0660265Z     tracer_output = trace_frame(
2025-12-04T10:52:01.0660697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:01.0660811Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0661285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:01.0661381Z     run_tracer()
2025-12-04T10:52:01.0661840Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:01.0661952Z     tracer.run()
2025-12-04T10:52:01.0662397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:01.0662514Z     while self.step():
2025-12-04T10:52:01.0662959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:01.0663111Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:01.0663585Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:01.0663698Z     return inner_fn(self, inst)
2025-12-04T10:52:01.0664211Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:01.0664345Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:01.0664843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:01.0665097Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:01.0665625Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:01.0665801Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:01.0666305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:01.0666427Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:01.0666928Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:01.0667152Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:01.0667823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:01.0667948Z     out = _wrap_fx_proxy(
2025-12-04T10:52:01.0668454Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:01.0668711Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:01.0669165Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:01.0669512Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:01.0669966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:01.0670083Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:01.0670553Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:01.0670664Z     return fn()
2025-12-04T10:52:01.0671075Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:01.0671269Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:01.0671695Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:01.0671891Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:01.0672314Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:01.0672527Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0770920Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0771382Z 
2025-12-04T10:52:01.0771500Z from user code:
2025-12-04T10:52:01.0771836Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0771968Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0771979Z 
2025-12-04T10:52:01.0772699Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0772705Z 
2025-12-04T10:52:01.0772709Z 
2025-12-04T10:52:01.0772922Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0773466Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0773472Z 
2025-12-04T10:52:01.0773736Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0773967Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0774071Z frames [('total', 1)]
2025-12-04T10:52:01.0774284Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0775458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0775677Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0777340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0777554Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0778688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0778966Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0780093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0780311Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0781439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0781656Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0782794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0783002Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0784114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:01.0784326Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0784553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0784657Z frames [('total', 1)]
2025-12-04T10:52:01.0784871Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0786025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0786234Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0787964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0788172Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0789300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0789519Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0790649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0790866Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0792060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0792276Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0793418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0793675Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0793831Z =================================== FAILURES ===================================
2025-12-04T10:52:01.0794099Z ____________ CudaReproTests.test_effn_attn_bias_padding_misaligned _____________
2025-12-04T10:52:01.0794220Z Traceback (most recent call last):
2025-12-04T10:52:01.0794749Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 249, in test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0794906Z     out, code = run_and_get_code(f_compiled, *inputs)
2025-12-04T10:52:01.0795389Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 2409, in run_and_get_code
2025-12-04T10:52:01.0795501Z     result = fn(*args, **kwargs)
2025-12-04T10:52:01.0795975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T10:52:01.0796100Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0796559Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2194, in __call__
2025-12-04T10:52:01.0796708Z     result = self._torchdynamo_orig_backend(
2025-12-04T10:52:01.0797159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1937, in __call__
2025-12-04T10:52:01.0797273Z     result = self._inner_convert(
2025-12-04T10:52:01.0797735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 706, in __call__
2025-12-04T10:52:01.0797835Z     result = _compile(
2025-12-04T10:52:01.0798284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1744, in _compile
2025-12-04T10:52:01.0798531Z     guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0798982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
2025-12-04T10:52:01.0799115Z     return function(*args, **kwargs)
2025-12-04T10:52:01.0799595Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1425, in compile_inner
2025-12-04T10:52:01.0799742Z     return _compile_inner(code, one_graph, hooks)
2025-12-04T10:52:01.0800237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1459, in _compile_inner
2025-12-04T10:52:01.0800356Z     dynamo_output = compile_frame(
2025-12-04T10:52:01.0800998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1341, in compile_frame
2025-12-04T10:52:01.0801242Z     bytecode, tracer_output = transform_code_object(code, transform)
2025-12-04T10:52:01.0801828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
2025-12-04T10:52:01.0802129Z     tracer_output = transformations(instructions, code_options)
2025-12-04T10:52:01.0802604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1313, in transform
2025-12-04T10:52:01.0802719Z     tracer_output = trace_frame(
2025-12-04T10:52:01.0803159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 328, in _fn
2025-12-04T10:52:01.0803267Z     return fn(*args, **kwargs)
2025-12-04T10:52:01.0803869Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 837, in trace_frame
2025-12-04T10:52:01.0803966Z     run_tracer()
2025-12-04T10:52:01.0804424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 818, in run_tracer
2025-12-04T10:52:01.0804535Z     tracer.run()
2025-12-04T10:52:01.0804979Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1639, in run
2025-12-04T10:52:01.0805082Z     while self.step():
2025-12-04T10:52:01.0805549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1319, in step
2025-12-04T10:52:01.0805779Z     self.dispatch_table[inst.opcode](self, inst)
2025-12-04T10:52:01.0806246Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 858, in wrapper
2025-12-04T10:52:01.0806358Z     return inner_fn(self, inst)
2025-12-04T10:52:01.0806874Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2573, in CALL_FUNCTION_KW
2025-12-04T10:52:01.0807011Z     self.call_function(fn, args, kwargs)
2025-12-04T10:52:01.0807505Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1225, in call_function
2025-12-04T10:52:01.0807758Z     self.push(fn.call_function(self, args, kwargs))  # type: ignore[arg-type]
2025-12-04T10:52:01.0808289Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/lazy.py", line 218, in realize_and_forward
2025-12-04T10:52:01.0808470Z     return getattr(self.realize(), name)(*args, **kwargs)
2025-12-04T10:52:01.0808972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/torch.py", line 1587, in call_function
2025-12-04T10:52:01.0809091Z     tensor_variable = wrap_fx_proxy(
2025-12-04T10:52:01.0809591Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2779, in wrap_fx_proxy
2025-12-04T10:52:01.0809822Z     return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
2025-12-04T10:52:01.0810350Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2845, in wrap_fx_proxy_cls
2025-12-04T10:52:01.0810467Z     out = _wrap_fx_proxy(
2025-12-04T10:52:01.0810972Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 2956, in _wrap_fx_proxy
2025-12-04T10:52:01.0811225Z     example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
2025-12-04T10:52:01.0811685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3590, in get_fake_value
2025-12-04T10:52:01.0811954Z     raise TorchRuntimeError(str(e)).with_traceback(e.__traceback__) from None
2025-12-04T10:52:01.0812390Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3488, in get_fake_value
2025-12-04T10:52:01.0812517Z     ret_val = wrap_fake_exception(
2025-12-04T10:52:01.0812990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 2965, in wrap_fake_exception
2025-12-04T10:52:01.0813095Z     return fn()
2025-12-04T10:52:01.0813506Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3489, in <lambda>
2025-12-04T10:52:01.0813703Z     lambda: run_node(tx.output, node, args, kwargs, nnmodule)
2025-12-04T10:52:01.0814124Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3699, in run_node
2025-12-04T10:52:01.0814325Z     raise RuntimeError(make_error_message(e)).with_traceback(
2025-12-04T10:52:01.0814745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 3658, in run_node
2025-12-04T10:52:01.0814955Z     return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0913854Z torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.0914311Z 
2025-12-04T10:52:01.0914429Z from user code:
2025-12-04T10:52:01.0914764Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.0914955Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.0914960Z 
2025-12-04T10:52:01.0915674Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.0915680Z 
2025-12-04T10:52:01.0915686Z 
2025-12-04T10:52:01.0915902Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.0916445Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.0916450Z 
2025-12-04T10:52:01.0916711Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.0916946Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0917049Z frames [('total', 1)]
2025-12-04T10:52:01.0917264Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0918439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0918651Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0920252Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0920461Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0921599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0921825Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0923016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0923234Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0924363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0924566Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0925728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0925930Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0927114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T10:52:01.0927322Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0927550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0927652Z frames [('total', 1)]
2025-12-04T10:52:01.0927862Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:52:01.0929157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Memory efficient kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:956.)
2025-12-04T10:52:01.0929362Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0930958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Expected query, key and value to all be of dtype: {Half, Float}. Got Query dtype: c10::BFloat16, Key dtype: c10::BFloat16, and Value dtype: c10::BFloat16 instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:91.)
2025-12-04T10:52:01.0931166Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0932295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:958.)
2025-12-04T10:52:01.0932516Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0933641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: Flash attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/sdp_utils_cpp.h:540.)
2025-12-04T10:52:01.0933868Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0934993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention kernel not used because: (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:960.)
2025-12-04T10:52:01.0935210Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0936355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/utils.py:3658: UserWarning: cuDNN attention has been runtime disabled. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:676.)
2025-12-04T10:52:01.0936563Z   return node.target(*args, **kwargs)  # type: ignore[operator]
2025-12-04T10:52:01.0936791Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:52:01.0936891Z frames [('total', 1)]
2025-12-04T10:52:01.0937603Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml -
2025-12-04T10:52:01.0937787Z =========================== short test summary info ============================
2025-12-04T10:52:01.1037657Z FAILED [0.0219s] inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned - torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function scaled_dot_product_attention>(*(FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16), FakeTensor(..., device='cuda:0', size=(32, 16, 1007, 64), dtype=torch.bfloat16)), **{'attn_mask': FakeTensor(..., device='cuda:0', size=(32, 1, 1007, 1007), dtype=torch.bool), 'dropout_p': 0.0}): got RuntimeError('No available kernel. Aborting execution.\nException raised from select_sdp_backend at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/sdp_utils.cpp:962 (most recent call first):\nC++ CapturedTraceback:\n#4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0\n#5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0\n#6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0\n#7 sdp::select_sdp_backend(sdp::sdp_params const&) from :0\n#8 at::native::_fused_sdp_choice_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#9 at::native::scaled_dot_product_attention(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#10 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__scaled_dot_product_attention>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool> >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from RegisterCompositeImplicitAutograd_0.cpp:0\n#11 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#12 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#13 void c10::BoxedKernel::make_boxed_function<&(anonymous namespace)::pythonTLSSnapshotFallback>(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from PythonFallbackKernel.cpp:0\n#14 c10::OperatorHandle::callBoxedForDispatchKey(c10::DispatchKey, std::vector<c10::IValue, std::allocator<c10::IValue> >&) const from :0\n#15 torch::detail::(anonymous namespace)::ConcretePyInterpreterVTable::python_dispatcher(c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const from PyInterpreter.cpp:0\n#16 at::_ops::scaled_dot_product_attention::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, double, bool, std::optional<double>, bool) from ??:0\n#17 torch::autograd::THPVariable_scaled_dot_product_attention(_object*, _object*, _object*) from python_nn_functions.cpp:0\n#18 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543\n#19 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#20 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917\n#21 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#22 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#23 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#24 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#26 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#27 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#28 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#29 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#30 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#32 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#33 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#34 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#35 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#36 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#37 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#38 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#43 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#45 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#47 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#48 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#49 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#50 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#53 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#55 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#56 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#57 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#58 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#59 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#60 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#61 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#62 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#63 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#64 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#65 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#68 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#69 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#70 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#71 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#72 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#73 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#74 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#75 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#76 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#77 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#78 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#79 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#80 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#81 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#82 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#83 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#86 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#87 dynamo__custom_eval_frame from :0\n#88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#91 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#93 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#95 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#96 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#97 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#98 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#105 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#107 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267\n#108 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#113 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#114 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#118 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#120 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#124 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#125 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#126 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#127 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305\n#128 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#132 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#133 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#134 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#135 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#138 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#140 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#144 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#145 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#146 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#147 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#148 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#149 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#152 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#154 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#158 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#159 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#160 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#161 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#162 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#163 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#168 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945\n#169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#174 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153\n#175 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431\n#176 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494\n#177 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215\n#178 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112\n#179 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#180 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#181 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#182 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#183 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#184 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#185 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#186 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114\n#187 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46\n#188 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134\n#189 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291\n#190 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312\n#191 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208\n#192 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456\n#193 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90\n#194 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357\n#195 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090\n#196 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58\n#197 __libc_start_main_impl from ./csu/../csu/libc-start.c:392\n#198 _start from ??:0\n#199 <unwind unsupported> from ??:0\n')
2025-12-04T10:52:01.1038137Z 
2025-12-04T10:52:01.1038267Z from user code:
2025-12-04T10:52:01.1038664Z    File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 243, in f
2025-12-04T10:52:01.1038800Z     return F.scaled_dot_product_attention(
2025-12-04T10:52:01.1038805Z 
2025-12-04T10:52:01.1039523Z Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
2025-12-04T10:52:01.1039529Z 
2025-12-04T10:52:01.1039534Z 
2025-12-04T10:52:01.1039744Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1040343Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.1040348Z 
2025-12-04T10:52:01.1040609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1040787Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:01.1040999Z ================== 1 failed, 95 deselected, 2 rerun in 14.98s ==================
2025-12-04T10:52:01.1041096Z Got exit code 1
2025-12-04T10:52:01.1041559Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned
2025-12-04T10:52:01.1041962Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:52:01.1042462Z W1204 10:49:34.543000 81195 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1042998Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml
2025-12-04T10:52:01.1043164Z ============================= test session starts ==============================
2025-12-04T10:52:01.1043522Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:01.1043632Z cachedir: .pytest_cache
2025-12-04T10:52:01.1044148Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:01.1044285Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:01.1044391Z configfile: pytest.ini
2025-12-04T10:52:01.1044926Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:01.1045155Z collecting ... collected 96 items / 19 deselected / 77 selected
2025-12-04T10:52:01.1045293Z stepcurrent: skipping 19 already run items.
2025-12-04T10:52:01.1045427Z Running 77 items in this shard
2025-12-04T10:52:01.1045432Z 
2025-12-04T10:52:01.1045791Z inductor/test_cuda_repro.py::CudaReproTests::test_embedding_var_mean PASSED [3.8390s] [  1%]
2025-12-04T10:52:01.1046154Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_low_precision PASSED [0.5811s] [  2%]
2025-12-04T10:52:01.1046621Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_mean_ratio_chain PASSED [1.0348s] [  3%]
2025-12-04T10:52:01.1047059Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_min_pow_chain PASSED [0.7899s] [  5%]
2025-12-04T10:52:01.1047505Z inductor/test_cuda_repro.py::CudaReproTests::test_emulate_precision_casts_norm_rounding PASSED [0.1112s] [  6%]
2025-12-04T10:52:01.1048319Z inductor/test_cuda_repro.py::CudaReproTests::test_epilogue_fusion_with_view W1204 10:49:43.102000 81195 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1048420Z PASSED [3.5055s] [  7%]
2025-12-04T10:52:01.1048829Z inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs PASSED [0.5344s] [  9%]
2025-12-04T10:52:01.1049289Z inductor/test_cuda_repro.py::CudaReproTests::test_expanded_inputs_cudagraphs_no_size_asserts PASSED [0.4844s] [ 10%]
2025-12-04T10:52:01.1049828Z inductor/test_cuda_repro.py::CudaReproTests::test_flash_attention_dynamic SKIPPED [0.0003s] (flash attention not supported) [ 11%]
2025-12-04T10:52:01.1050238Z inductor/test_cuda_repro.py::CudaReproTests::test_float64_constants PASSED [0.6928s] [ 12%]
2025-12-04T10:52:01.1050829Z inductor/test_cuda_repro.py::CudaReproTests::test_float8_e8m0fnu SKIPPED [0.0032s] (uses bfloat16 atomic add instrs which requires SM >= 90) [ 14%]
2025-12-04T10:52:01.1051148Z inductor/test_cuda_repro.py::CudaReproTests::test_full_copy PASSED [0.1728s] [ 15%]
2025-12-04T10:52:01.1051465Z inductor/test_cuda_repro.py::CudaReproTests::test_identity_load PASSED [0.6059s] [ 16%]
2025-12-04T10:52:01.1052069Z inductor/test_cuda_repro.py::CudaReproTests::test_index_add_fallback SKIPPED [0.0031s] (uses bfloat16 atomic add instrs which requires SM >= 90) [ 18%]
2025-12-04T10:52:01.1052491Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_cudagraph PASSED [0.9950s] [ 19%]
2025-12-04T10:52:01.1052878Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_inplace_cudagraph PASSED [0.4816s] [ 20%]
2025-12-04T10:52:01.1053224Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_issue PASSED [0.5307s] [ 22%]
2025-12-04T10:52:01.1053623Z inductor/test_cuda_repro.py::CudaReproTests::test_index_put_no_fallback_cudagraph PASSED [0.5661s] [ 23%]
2025-12-04T10:52:01.1054011Z inductor/test_cuda_repro.py::CudaReproTests::test_indirect_indexing_dense_mask PASSED [0.5823s] [ 24%]
2025-12-04T10:52:01.1054453Z inductor/test_cuda_repro.py::CudaReproTests::test_inductor_output_aliases_intermediate PASSED [0.0051s] [ 25%]
2025-12-04T10:52:01.1054834Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_add_alpha_autotune PASSED [0.5240s] [ 27%]
2025-12-04T10:52:01.1055218Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_buffer_autotune PASSED [0.5783s] [ 28%]
2025-12-04T10:52:01.1055598Z inductor/test_cuda_repro.py::CudaReproTests::test_inplace_updates_cudagraphs PASSED [0.3349s] [ 29%]
2025-12-04T10:52:01.1055947Z inductor/test_cuda_repro.py::CudaReproTests::test_input_channels_last PASSED [0.6611s] [ 31%]
2025-12-04T10:52:01.1056519Z inductor/test_cuda_repro.py::CudaReproTests::test_int64_index_intermediate SKIPPED [0.0031s] (uses bfloat16 which requires SM >= 80) [ 32%]
2025-12-04T10:52:01.1056828Z inductor/test_cuda_repro.py::CudaReproTests::test_issue100806 PASSED [0.6958s] [ 33%]
2025-12-04T10:52:01.1057147Z inductor/test_cuda_repro.py::CudaReproTests::test_issue103461 PASSED [0.5002s] [ 35%]
2025-12-04T10:52:01.1057450Z inductor/test_cuda_repro.py::CudaReproTests::test_issue103481 PASSED [0.2437s] [ 36%]
2025-12-04T10:52:01.1057753Z inductor/test_cuda_repro.py::CudaReproTests::test_issue104759 PASSED [0.6785s] [ 37%]
2025-12-04T10:52:01.1058109Z inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_1input PASSED [0.2135s] [ 38%]
2025-12-04T10:52:01.1058443Z inductor/test_cuda_repro.py::CudaReproTests::test_issue97695_2input PASSED [0.1810s] [ 40%]
2025-12-04T10:52:01.1058760Z inductor/test_cuda_repro.py::CudaReproTests::test_issue_103924 PASSED [0.3615s] [ 41%]
2025-12-04T10:52:01.1059100Z inductor/test_cuda_repro.py::CudaReproTests::test_libdevice_routing PASSED [0.6199s] [ 42%]
2025-12-04T10:52:01.1059440Z inductor/test_cuda_repro.py::CudaReproTests::test_linear_cpu_input PASSED [0.3390s] [ 44%]
2025-12-04T10:52:01.1059856Z inductor/test_cuda_repro.py::CudaReproTests::test_linear_with_zero_infeature_size PASSED [0.1767s] [ 45%]
2025-12-04T10:52:01.1060206Z inductor/test_cuda_repro.py::CudaReproTests::test_lookup_seed_backward PASSED [0.7036s] [ 46%]
2025-12-04T10:52:01.1060553Z inductor/test_cuda_repro.py::CudaReproTests::test_max_autotune_nograd PASSED [3.8644s] [ 48%]
2025-12-04T10:52:01.1060934Z inductor/test_cuda_repro.py::CudaReproTests::test_memory_history_inductor PASSED [0.3135s] [ 49%]
2025-12-04T10:52:01.1061289Z inductor/test_cuda_repro.py::CudaReproTests::test_mm_out_dtype_compile PASSED [0.1566s] [ 50%]
2025-12-04T10:52:01.1061689Z inductor/test_cuda_repro.py::CudaReproTests::test_multi_output_layout_fallback PASSED [0.2141s] [ 51%]
2025-12-04T10:52:01.1062049Z inductor/test_cuda_repro.py::CudaReproTests::test_mutated_aligned_tensor PASSED [0.1715s] [ 53%]
2025-12-04T10:52:01.1062903Z inductor/test_cuda_repro.py::CudaReproTests::test_negative_arange_dynamic_shapes W1204 10:50:03.479000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1063309Z W1204 10:50:05.478000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1063695Z W1204 10:50:12.909000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1064088Z W1204 10:50:12.919000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1064529Z W1204 10:50:12.929000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1064913Z W1204 10:50:12.940000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1065308Z W1204 10:50:12.950000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1065694Z W1204 10:50:12.961000 81195 site-packages/torch/_dynamo/debug_utils.py:429] Could not generate fp64 outputs
2025-12-04T10:52:01.1065812Z PASSED [9.5289s] [ 54%]
2025-12-04T10:52:01.1066210Z inductor/test_cuda_repro.py::CudaReproTests::test_no_device_idx_repro_cudagraphs PASSED [0.2367s] [ 55%]
2025-12-04T10:52:01.1066577Z inductor/test_cuda_repro.py::CudaReproTests::test_non_commutative_scan_op PASSED [2.2204s] [ 57%]
2025-12-04T10:52:01.1067032Z inductor/test_cuda_repro.py::CudaReproTests::test_non_contiguous_unaligned_input_indices PASSED [0.0035s] [ 58%]
2025-12-04T10:52:01.1067399Z inductor/test_cuda_repro.py::CudaReproTests::test_normalize_norm_leq_one PASSED [0.2424s] [ 59%]
2025-12-04T10:52:01.1067968Z inductor/test_cuda_repro.py::CudaReproTests::test_not_initializing_wrong_device SKIPPED [0.0003s] (requires multiple cuda devices) [ 61%]
2025-12-04T10:52:01.1068292Z inductor/test_cuda_repro.py::CudaReproTests::test_permute_fusion PASSED [0.5552s] [ 62%]
2025-12-04T10:52:01.1068869Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile ('RERUN', {'yellow': True}) [0.0303s] [ 63%]
2025-12-04T10:52:01.1069453Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile ('RERUN', {'yellow': True}) [0.0045s] [ 63%]
2025-12-04T10:52:01.1069932Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile FAILED [0.0050s] [ 63%]
2025-12-04T10:52:01.1069938Z 
2025-12-04T10:52:01.1070097Z ==================================== RERUNS ====================================
2025-12-04T10:52:01.1070391Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1070510Z Traceback (most recent call last):
2025-12-04T10:52:01.1071111Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1071237Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1071601Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1071843Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1072213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1072340Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1072510Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1072515Z 
2025-12-04T10:52:01.1072732Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1073347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1073352Z 
2025-12-04T10:52:01.1073614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1073916Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1074101Z Traceback (most recent call last):
2025-12-04T10:52:01.1074683Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1074815Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1075174Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1075405Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1075838Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1075951Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1076128Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1076134Z 
2025-12-04T10:52:01.1076345Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1076950Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1076957Z 
2025-12-04T10:52:01.1077226Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1077371Z =================================== FAILURES ===================================
2025-12-04T10:52:01.1077672Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1077790Z Traceback (most recent call last):
2025-12-04T10:52:01.1078365Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1078506Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1078862Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1079092Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1079472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1079586Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1079763Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1079768Z 
2025-12-04T10:52:01.1079981Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1080578Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1080597Z 
2025-12-04T10:52:01.1080859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1081564Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml -
2025-12-04T10:52:01.1081746Z =========================== short test summary info ============================
2025-12-04T10:52:01.1082514Z FAILED [0.0050s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1082521Z 
2025-12-04T10:52:01.1082747Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1083345Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1083350Z 
2025-12-04T10:52:01.1083608Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1083800Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:01.1084035Z ======= 1 failed, 43 passed, 5 skipped, 19 deselected, 2 rerun in 39.99s =======
2025-12-04T10:52:01.1084131Z Got exit code 1
2025-12-04T10:52:01.1084250Z Retrying single test...
2025-12-04T10:52:01.1084689Z W1204 10:50:27.877000 82859 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1085296Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml
2025-12-04T10:52:01.1085459Z ============================= test session starts ==============================
2025-12-04T10:52:01.1085802Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:01.1085921Z cachedir: .pytest_cache
2025-12-04T10:52:01.1086429Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:01.1086609Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:01.1086731Z configfile: pytest.ini
2025-12-04T10:52:01.1087263Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:01.1087491Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:01.1088175Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1088289Z Running 1 items in this shard
2025-12-04T10:52:01.1088294Z 
2025-12-04T10:52:01.1089250Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:29.258688608 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T10:52:01.1089763Z [W1204 10:50:29.258704409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1089772Z 
2025-12-04T10:52:01.1089914Z ('RERUN', {'yellow': True}) [15.5910s] [100%]
2025-12-04T10:52:01.1090892Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:45.858803773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1090901Z 
2025-12-04T10:52:01.1091044Z ('RERUN', {'yellow': True}) [0.0067s] [100%]
2025-12-04T10:52:01.1092015Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:45.864580178 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1092020Z 
2025-12-04T10:52:01.1092118Z FAILED [0.0039s] [100%]
2025-12-04T10:52:01.1092141Z 
2025-12-04T10:52:01.1092280Z ==================================== RERUNS ====================================
2025-12-04T10:52:01.1092574Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1092710Z Traceback (most recent call last):
2025-12-04T10:52:01.1093290Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1093413Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1093790Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1094018Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1094395Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1094510Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1094677Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1095476Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1095591Z C++ CapturedTraceback:
2025-12-04T10:52:01.1096943Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1097423Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1097747Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1099078Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1101062Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1108213Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1109628Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1111229Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1112109Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1116231Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1117197Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1118330Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1121788Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1122475Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1123219Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1124043Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1128938Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1129211Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1129600Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1129865Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1130131Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1130498Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1130846Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1131157Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1131444Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1131741Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1132151Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1132520Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1132789Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1133151Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1133404Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1133782Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1134181Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1134558Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1134953Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1135317Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1135726Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1136089Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1136483Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1136860Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1137147Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1137409Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1137769Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1138110Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1138422Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1138708Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1139016Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1139467Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1139830Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1140236Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1140598Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1140864Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1141282Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1141676Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1142051Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1142451Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1142815Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1143168Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1143466Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1143764Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1144029Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1144281Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1144655Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1145053Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1145429Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1145823Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1146181Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1146449Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1146809Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1147219Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1147581Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1147971Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1148346Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1148602Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1148964Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1149369Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1149727Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1150137Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1150499Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1150837Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1151212Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1151502Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1151809Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1152204Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1152563Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1152902Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1153263Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1153657Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1154032Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1154430Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1154804Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1155149Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1155444Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1155741Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1156040Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1156448Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1156819Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1157233Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1157618Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1158023Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1158406Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1158665Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1159040Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1159458Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1159826Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1160234Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1160615Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1160963Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1161277Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1161567Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1161870Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1162363Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1162734Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1163228Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1163599Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1164002Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1164385Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1164789Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1165308Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1165714Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1166082Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1166390Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1166692Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1166959Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1167254Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1167601Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1167943Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1168229Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1168496Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1168776Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1168973Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1169091Z #135 _start from ??:0
2025-12-04T10:52:01.1169210Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1169217Z 
2025-12-04T10:52:01.1169222Z 
2025-12-04T10:52:01.1169438Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1170060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1170066Z 
2025-12-04T10:52:01.1170334Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1170645Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1170766Z Traceback (most recent call last):
2025-12-04T10:52:01.1171351Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1171489Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1171853Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1172085Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1172466Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1172580Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1172766Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1173558Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1173669Z C++ CapturedTraceback:
2025-12-04T10:52:01.1175022Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1175502Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1175846Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1177166Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1178930Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1186021Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1187418Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1189026Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1189840Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1193974Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1194923Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1196037Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1199494Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1200117Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1201019Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1201850Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1206873Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1207163Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1207558Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1207821Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1208086Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1208456Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1208816Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1209113Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1209402Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1209709Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1210109Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1210477Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1210743Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1211104Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1211373Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1211735Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1212130Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1212504Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1212901Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1213279Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1213675Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1214038Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1214449Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1214811Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1215111Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1215363Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1215725Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1216076Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1216380Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1216665Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1216974Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1217436Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1217812Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1218206Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1218567Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1218829Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1219249Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1219657Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1220017Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1220416Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1220789Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1221136Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1221445Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1221730Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1221998Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1222261Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1222622Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1223017Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1223393Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1223786Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1224159Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1224410Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1224772Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1225185Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1225542Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1225949Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1226315Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1226566Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1226940Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1227335Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1227694Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1228104Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1228463Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1228814Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1229183Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1229472Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1229778Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1230176Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1230549Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1230858Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1231218Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1231625Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1231990Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1232397Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1232757Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1233095Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1233404Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1233695Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1233988Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1234397Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1234769Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1235193Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1235564Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1235967Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1236350Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1236612Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1236998Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1237398Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1237766Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1238184Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1238553Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1238913Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1239216Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1239506Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1239820Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1240223Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1240591Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1241065Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1241433Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1241847Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1242273Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1242676Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1243125Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1243527Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1243908Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1244201Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1244502Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1244782Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1245063Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1245425Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1245746Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1246030Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1246312Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1246577Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1246773Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1246886Z #135 _start from ??:0
2025-12-04T10:52:01.1247008Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1247015Z 
2025-12-04T10:52:01.1247019Z 
2025-12-04T10:52:01.1247248Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1247858Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1247864Z 
2025-12-04T10:52:01.1248130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1248287Z =================================== FAILURES ===================================
2025-12-04T10:52:01.1248577Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1248713Z Traceback (most recent call last):
2025-12-04T10:52:01.1249301Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1249425Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1249797Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1250026Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1250391Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1250517Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1250689Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1251491Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1251598Z C++ CapturedTraceback:
2025-12-04T10:52:01.1252964Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1253457Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1253785Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1255122Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1256880Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1263951Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1265359Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1266972Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1267812Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1271934Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1272872Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1274004Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1277423Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1278063Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1278799Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1279637Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1284586Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1284917Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1285254Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1285522Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1285779Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1286170Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1286515Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1286821Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1287126Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1287423Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1287843Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1288208Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1288477Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1288844Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1289097Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1289474Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1289870Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1290246Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1290647Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1291008Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1291417Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1291784Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1292197Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1292559Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1292847Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1293118Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1293484Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1293827Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1294140Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1294428Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1294804Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1295204Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1295567Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1295980Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1296345Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1296682Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1297046Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1297439Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1297819Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1298218Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1298580Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1298940Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1299238Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1299543Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1299808Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1300064Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1300437Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1300985Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1301362Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1301757Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1302120Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1302389Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1302751Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1303157Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1303519Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1303915Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1304291Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1304540Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1304900Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1305307Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1305670Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1306073Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1306432Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1306882Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1307195Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1307480Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1307788Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1308182Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1308630Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1308897Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1309260Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1309674Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1310032Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1310426Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1310800Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1311145Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1311448Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1311746Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1312041Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1312452Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1312829Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1313233Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1313617Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1314024Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1314409Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1314668Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1315036Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1315448Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1315820Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1316234Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1316602Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1316949Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1317265Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1317559Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1317857Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1318276Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1318704Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1319122Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1319488Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1319891Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1320271Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1320733Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1321113Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1321514Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1321888Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1322247Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1322555Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1322835Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1323115Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1323465Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1323799Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1324085Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1324351Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1324635Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1324832Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1324945Z #135 _start from ??:0
2025-12-04T10:52:01.1325064Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1325070Z 
2025-12-04T10:52:01.1325074Z 
2025-12-04T10:52:01.1325285Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1325913Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1325922Z 
2025-12-04T10:52:01.1326186Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1326910Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml -
2025-12-04T10:52:01.1327084Z =========================== short test summary info ============================
2025-12-04T10:52:01.1327786Z FAILED [0.0039s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1328577Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1328686Z C++ CapturedTraceback:
2025-12-04T10:52:01.1329974Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1330450Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1330842Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1332178Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1333863Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1341016Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1342425Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1344032Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1344791Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1348973Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1349941Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1351052Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1354487Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1355110Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1355849Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1356678Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1361585Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1361852Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1362232Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1362497Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1362830Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1363198Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1363539Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1363849Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1364138Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1364446Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1364844Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1365209Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1365473Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1365841Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1366094Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1366469Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1366868Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1367242Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1367639Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1367997Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1368403Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1368767Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1369178Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1369539Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1369831Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1370100Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1370462Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1370817Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1371224Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1371528Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1371841Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1372237Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1372600Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1373158Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1373523Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1373786Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1374147Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1374546Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1374976Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1375369Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1375743Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1376086Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1376383Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1376684Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1376945Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1377195Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1377569Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1377967Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1378344Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1378739Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1379105Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1379374Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1379734Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1380142Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1380504Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1380905Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1381278Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1381529Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1381908Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1382304Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1382665Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1383075Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1383437Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1383797Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1384096Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1384386Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1384758Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1385157Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1385518Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1385785Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1386147Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1386616Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1386979Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1387376Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1387757Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1388101Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1388412Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1388699Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1388992Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1389402Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1389780Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1390185Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1390571Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1390979Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1391365Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1391623Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1391989Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1392402Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1392772Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1393190Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1393560Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1393917Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1394233Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1394524Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1394835Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1395234Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1395608Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1396021Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1396389Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1396874Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1397256Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1397658Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1398035Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1398437Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1398928Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1399225Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1399528Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1399811Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1400090Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1400436Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1400774Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1401273Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1401558Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1401824Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1402018Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1402221Z #135 _start from ??:0
2025-12-04T10:52:01.1402345Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1402351Z 
2025-12-04T10:52:01.1402356Z 
2025-12-04T10:52:01.1402575Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1403199Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1403204Z 
2025-12-04T10:52:01.1403469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1403664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:01.1403861Z ================== 1 failed, 95 deselected, 2 rerun in 15.64s ==================
2025-12-04T10:52:01.1403964Z Got exit code 1
2025-12-04T10:52:01.1404083Z Retrying single test...
2025-12-04T10:52:01.1404525Z W1204 10:50:55.407000 82979 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1405065Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml
2025-12-04T10:52:01.1405230Z ============================= test session starts ==============================
2025-12-04T10:52:01.1405575Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:01.1405693Z cachedir: .pytest_cache
2025-12-04T10:52:01.1406203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:01.1406325Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:01.1406443Z configfile: pytest.ini
2025-12-04T10:52:01.1406980Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:01.1407203Z collecting ... collected 96 items / 95 deselected / 1 selected
2025-12-04T10:52:01.1407885Z stepcurrent: skipping 67 already run items. Running only test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1408132Z Running 1 items in this shard
2025-12-04T10:52:01.1408138Z 
2025-12-04T10:52:01.1409097Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:50:57.793158683 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T10:52:01.1409610Z [W1204 10:50:57.793179857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1409616Z 
2025-12-04T10:52:01.1409849Z ('RERUN', {'yellow': True}) [15.4292s] [100%]
2025-12-04T10:52:01.1410822Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:51:12.231610084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1410828Z 
2025-12-04T10:52:01.1410968Z ('RERUN', {'yellow': True}) [0.0068s] [100%]
2025-12-04T10:52:01.1411947Z inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile [W1204 10:51:12.237549278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T10:52:01.1411952Z 
2025-12-04T10:52:01.1412052Z FAILED [0.0040s] [100%]
2025-12-04T10:52:01.1412057Z 
2025-12-04T10:52:01.1412211Z ==================================== RERUNS ====================================
2025-12-04T10:52:01.1412503Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1412643Z Traceback (most recent call last):
2025-12-04T10:52:01.1413229Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1413355Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1413732Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1413969Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1414337Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1414464Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1414630Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1415431Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1415540Z C++ CapturedTraceback:
2025-12-04T10:52:01.1416807Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1417297Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1417625Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1418964Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1420659Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1427799Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1429257Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1430863Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1431616Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1435718Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1436660Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
﻿2025-12-04T10:52:01.1444286Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1447817Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1448488Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1449233Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1450051Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1454915Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1455189Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1455522Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1455786Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1456054Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1456421Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1456763Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1457168Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1457458Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1457768Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1458248Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1458613Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1458916Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1459280Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1459531Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1459907Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1460309Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1460683Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1461082Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1461443Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1461849Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1462209Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1462614Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1462974Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1463359Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1463626Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1464049Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1464404Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1464698Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1464986Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1465293Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1465691Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1466056Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1466464Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1466822Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1467088Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1467449Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1467845Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1468218Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1468612Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1468985Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1469400Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1469695Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1470050Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1470311Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1470564Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1470975Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1471372Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1471791Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1472233Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1472598Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1472865Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1473228Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1473635Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1473998Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1474391Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1474759Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1475045Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1475405Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1475801Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1476180Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1476575Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1476955Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1477295Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1477592Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1477890Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1478189Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1478585Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1478963Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1479217Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1479595Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1479995Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1480358Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1480767Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1481197Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1481550Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1481887Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1482240Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1482552Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1482990Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1483382Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1483794Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1484169Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1484592Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1484962Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1485224Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1485606Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1486012Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1486398Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1486798Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1487168Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1487537Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1487840Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1488145Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1488443Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1488848Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1489231Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1489634Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1490020Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1490421Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1490786Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1491201Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1491568Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1491970Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1492349Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1492636Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1493015Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1493281Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1493559Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1493951Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1494268Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1494564Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1494881Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1495140Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1495349Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1495451Z #135 _start from ??:0
2025-12-04T10:52:01.1495575Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1495587Z 
2025-12-04T10:52:01.1495607Z 
2025-12-04T10:52:01.1495824Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1496439Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1496448Z 
2025-12-04T10:52:01.1496723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1497013Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1497138Z Traceback (most recent call last):
2025-12-04T10:52:01.1497742Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1497865Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1498238Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1498479Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1498845Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1498973Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1499141Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1499942Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1500054Z C++ CapturedTraceback:
2025-12-04T10:52:01.1501515Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1502011Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1502339Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1503673Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1505477Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1512571Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1514068Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1515678Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1516443Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1520565Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1521516Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1522739Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1526223Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1526862Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1527593Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1528432Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1533270Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1533544Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1533880Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1534147Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1534406Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1534789Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1535132Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1535489Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1535791Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1536083Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1536528Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1536892Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1537176Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1537553Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1537804Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1538181Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1538585Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1538946Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1539357Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1539719Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1540131Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1540490Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1540883Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1541257Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1541547Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1541799Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1542174Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1542513Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1542822Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1543108Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1543401Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1543809Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1544174Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1544583Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1544943Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1545197Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1545573Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1545967Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1546324Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1546730Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1547172Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1547534Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1547831Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1548163Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1548435Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1548686Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1549098Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1549494Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1549853Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1550263Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1550623Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1550889Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1551247Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1551640Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1552016Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1552409Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1552768Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1553035Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1553397Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1553802Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1554162Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1554556Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1554933Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1555274Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1555584Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1555877Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1556172Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1556582Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1556948Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1557212Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1557576Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1557972Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1558346Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1558741Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1559164Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1559519Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1559851Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1560148Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1560439Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1560863Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1561249Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1561653Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1562038Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1562506Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1562879Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1563151Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1563519Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1563940Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1564307Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1564709Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1565098Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1565448Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1565754Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1566064Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1566367Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1566790Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1567164Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1567570Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1567956Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1568359Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1568741Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1569146Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1569514Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1569934Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1570301Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1570602Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1570976Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1571245Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1571538Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1571917Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1572237Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1572537Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1572842Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1573122Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1573318Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1573419Z #135 _start from ??:0
2025-12-04T10:52:01.1573557Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1573567Z 
2025-12-04T10:52:01.1573572Z 
2025-12-04T10:52:01.1573793Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1574421Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1574430Z 
2025-12-04T10:52:01.1574694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1574838Z =================================== FAILURES ===================================
2025-12-04T10:52:01.1575146Z _____ CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile _____
2025-12-04T10:52:01.1575266Z Traceback (most recent call last):
2025-12-04T10:52:01.1575852Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2596, in test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1575992Z     correct = forward(*example_inputs)
2025-12-04T10:52:01.1576361Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_repro.py", line 2549, in forward
2025-12-04T10:52:01.1576606Z     torch.ops.aten._scaled_dot_product_efficient_attention.default(
2025-12-04T10:52:01.1576973Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T10:52:01.1577090Z     return self._op(*args, **kwargs)
2025-12-04T10:52:01.1577275Z RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1578067Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1578191Z C++ CapturedTraceback:
2025-12-04T10:52:01.1579469Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1579942Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1580283Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1581605Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1583445Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1590557Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1592023Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1593616Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1594391Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1598556Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1599457Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1600607Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1604331Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1604975Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1605707Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1606537Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1611388Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1611675Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1611994Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1612273Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1612531Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1612898Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1613399Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1613701Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1613988Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1614347Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1614748Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1615184Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1615439Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1615803Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1616070Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1616435Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1616843Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1617206Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1617600Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1617975Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1618366Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1618738Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1619132Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1619497Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1619797Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1620050Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1620411Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1620765Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1621065Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1621360Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1621653Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1622052Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1622427Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1622822Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1623203Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1623459Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1623824Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1624231Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1624592Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1625064Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1625424Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1625765Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1626109Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1626394Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1626685Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1626950Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1627313Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1627721Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1628087Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1628481Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1628856Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1629107Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1629478Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1629876Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1630238Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1630646Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1631010Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1631275Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1631634Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1632031Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1632401Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1632798Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1633158Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1633511Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1633811Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1634108Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1634400Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1634797Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1635169Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1635422Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1635796Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1636191Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1636551Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1637019Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1637380Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1637766Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1638063Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1638346Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1638678Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1639073Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1639444Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1639865Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1640234Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1640649Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1641019Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1641278Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1641661Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1642127Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1642513Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1642919Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1643286Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1643650Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1643954Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1644245Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1644561Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1644965Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1645347Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1645752Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1646121Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1646536Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1646905Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1647324Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1647698Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1648099Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1648479Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1648849Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1649167Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1649431Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1649743Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1650102Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1650453Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1650735Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1651012Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1651277Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1651486Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1651586Z #135 _start from ??:0
2025-12-04T10:52:01.1651704Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1651710Z 
2025-12-04T10:52:01.1651715Z 
2025-12-04T10:52:01.1651946Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1652560Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1652566Z 
2025-12-04T10:52:01.1652843Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1653546Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml -
2025-12-04T10:52:01.1653717Z =========================== short test summary info ============================
2025-12-04T10:52:01.1654432Z FAILED [0.0040s] inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile - RuntimeError: cutlassF: no kernel found to launch!
2025-12-04T10:52:01.1655215Z Exception raised from _efficient_attention_forward at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/attention.cu:1759 (most recent call first):
2025-12-04T10:52:01.1655356Z C++ CapturedTraceback:
2025-12-04T10:52:01.1656632Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T10:52:01.1657124Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T10:52:01.1657457Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T10:52:01.1658782Z #7 at::native::_efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<long>, std::optional<long>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1660488Z #8 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1667634Z #9 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___efficient_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1669115Z #10 at::_ops::_efficient_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, std::optional<c10::SymInt>, std::optional<c10::SymInt>, double, long, bool, std::optional<double>, std::optional<at::Tensor> const&, std::optional<long>) from ??:0
2025-12-04T10:52:01.1670719Z #11 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)::{lambda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&)#1}::operator()(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&) const from x_0.cudafe1.cpp:0
2025-12-04T10:52:01.1671487Z #12 at::native::_scaled_dot_product_efficient_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1675591Z #13 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T10:52:01.1676491Z #14 at::_ops::_scaled_dot_product_efficient_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from ??:0
2025-12-04T10:52:01.1677680Z #15 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>) from VariableType_3.cpp:0
2025-12-04T10:52:01.1681162Z #16 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_efficient_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, bool, double, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_3.cpp:0
2025-12-04T10:52:01.1681835Z #17 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T10:52:01.1682625Z #18 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1683467Z #19 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T10:52:01.1688294Z #20 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T10:52:01.1688581Z #21 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T10:52:01.1688902Z #22 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T10:52:01.1689189Z #23 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1689443Z #24 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T10:52:01.1689812Z #25 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1690171Z #26 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1690470Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1690774Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1691140Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1691542Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1691920Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1692205Z #32 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1692581Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1692878Z #34 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1693244Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1693654Z #36 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1694019Z #37 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1694415Z #38 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1694792Z #39 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1695190Z #40 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1695567Z #41 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1695962Z #42 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1696325Z #43 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1696622Z #44 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T10:52:01.1696871Z #45 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1697248Z #46 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1697589Z #47 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1697885Z #48 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1698181Z #49 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1698474Z #50 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1698886Z #51 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1699247Z #52 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1699645Z #53 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1700023Z #54 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1700277Z #55 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1700639Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1701279Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1701645Z #58 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1702055Z #59 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1702415Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1702758Z #61 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1703207Z #62 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1703497Z #63 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1703772Z #64 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T10:52:01.1704071Z #65 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1704434Z #66 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1704842Z #67 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1705244Z #68 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1705638Z #69 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1706013Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1706269Z #71 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1706643Z #72 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1707037Z #73 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1707398Z #74 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1707806Z #75 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1708168Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1708434Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1708794Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1709192Z #79 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1709562Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1709955Z #81 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1710327Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1710669Z #83 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1710970Z #84 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1711275Z #85 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1711570Z #86 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1711969Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1712342Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1712593Z #89 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1712968Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1713362Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1713725Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1714134Z #93 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1714493Z #94 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1714847Z #95 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1715203Z #96 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1715489Z #97 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1715827Z #98 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1716224Z #99 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1716704Z #100 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1717168Z #101 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1717539Z #102 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1717954Z #103 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1718329Z #104 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1718589Z #105 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T10:52:01.1718972Z #106 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1719378Z #107 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1719760Z #108 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1720163Z #109 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1720532Z #110 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1720895Z #111 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T10:52:01.1721201Z #112 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T10:52:01.1721504Z #113 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T10:52:01.1721804Z #114 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T10:52:01.1722266Z #115 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T10:52:01.1722655Z #116 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1723060Z #117 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1723446Z #118 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1723848Z #119 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1724221Z #120 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1724640Z #121 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1725008Z #122 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1725412Z #123 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T10:52:01.1725795Z #124 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T10:52:01.1726085Z #125 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T10:52:01.1726399Z #126 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T10:52:01.1726664Z #127 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T10:52:01.1726944Z #128 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T10:52:01.1727379Z #129 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T10:52:01.1727700Z #130 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T10:52:01.1728028Z #131 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T10:52:01.1728295Z #132 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T10:52:01.1728555Z #133 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T10:52:01.1728790Z #134 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T10:52:01.1728891Z #135 _start from ??:0
2025-12-04T10:52:01.1729009Z #136 <unwind unsupported> from ??:0
2025-12-04T10:52:01.1729015Z 
2025-12-04T10:52:01.1729032Z 
2025-12-04T10:52:01.1729246Z To execute this test, run the following from the base repo dir:
2025-12-04T10:52:01.1729855Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_repro.py CudaReproTests.test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1729862Z 
2025-12-04T10:52:01.1730138Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:52:01.1730315Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:52:01.1730509Z ================== 1 failed, 95 deselected, 2 rerun in 15.47s ==================
2025-12-04T10:52:01.1730619Z Got exit code 1
2025-12-04T10:52:01.1731148Z FAILED CONSISTENTLY: test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile
2025-12-04T10:52:01.1731564Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T10:52:01.1732001Z W1204 10:51:22.967000 83099 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1732531Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml
2025-12-04T10:52:01.1732704Z ============================= test session starts ==============================
2025-12-04T10:52:01.1733050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:52:01.1733173Z cachedir: .pytest_cache
2025-12-04T10:52:01.1733679Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:52:01.1733804Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:52:01.1733922Z configfile: pytest.ini
2025-12-04T10:52:01.1734452Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:52:01.1734666Z collecting ... collected 96 items / 68 deselected / 28 selected
2025-12-04T10:52:01.1734817Z stepcurrent: skipping 68 already run items.
2025-12-04T10:52:01.1734932Z Running 28 items in this shard
2025-12-04T10:52:01.1734941Z 
2025-12-04T10:52:01.1735312Z inductor/test_cuda_repro.py::CudaReproTests::test_red_dtype_mismatch PASSED [2.8342s] [  3%]
2025-12-04T10:52:01.1735698Z inductor/test_cuda_repro.py::CudaReproTests::test_reflection_pad_loop_order PASSED [0.6937s] [  7%]
2025-12-04T10:52:01.1736061Z inductor/test_cuda_repro.py::CudaReproTests::test_repeated_masked_load PASSED [0.4319s] [ 10%]
2025-12-04T10:52:01.1736422Z inductor/test_cuda_repro.py::CudaReproTests::test_scalar_triton_index PASSED [0.1718s] [ 14%]
2025-12-04T10:52:01.1737349Z inductor/test_cuda_repro.py::CudaReproTests::test_scaled_dot_product_efficient_attention_backward W1204 10:51:30.245000 83099 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:52:01.1737470Z PASSED [1.7852s] [ 17%]
2025-12-04T10:52:01.1737853Z inductor/test_cuda_repro.py::CudaReproTests::test_scatter_index_not_wrapped PASSED [0.5743s] [ 21%]
2025-12-04T10:52:01.1738579Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape0_quantiles_strides0_batch_size_16 PASSED [0.5497s] [ 25%]
2025-12-04T10:52:01.1739256Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape1_quantiles_strides1_batch_size_16 PASSED [0.5514s] [ 28%]
2025-12-04T10:52:01.1739968Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape2_quantiles_strides2_batch_size_16 PASSED [0.5632s] [ 32%]
2025-12-04T10:52:01.1740637Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape3_quantiles_strides3_batch_size_16 PASSED [0.5587s] [ 35%]
2025-12-04T10:52:01.1741323Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape4_quantiles_strides4_batch_size_16 PASSED [0.5434s] [ 39%]
2025-12-04T10:52:01.1741983Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape5_quantiles_strides5_batch_size_16 PASSED [0.5811s] [ 42%]
2025-12-04T10:52:01.1742650Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape6_quantiles_strides6_batch_size_16 PASSED [0.6076s] [ 46%]
2025-12-04T10:52:01.1743308Z inductor/test_cuda_repro.py::CudaReproTests::test_searchsorted_stride_permutations_quantiles_shape7_quantiles_strides7_batch_size_16 PASSED [0.5803s] [ 50%]
2025-12-04T10:52:01.1743727Z inductor/test_cuda_repro.py::CudaReproTests::test_selecsls42b_misaligned_address PASSED [2.1277s] [ 53%]
2025-12-04T10:52:01.1744049Z inductor/test_cuda_repro.py::CudaReproTests::test_simplify_dims PASSED [0.7630s] [ 57%]
2025-12-04T10:52:01.1744386Z inductor/test_cuda_repro.py::CudaReproTests::test_sort_stride_issue PASSED [0.3434s] [ 60%]
2025-12-04T10:52:01.1744712Z inductor/test_cuda_repro.py::CudaReproTests::test_sorted_masks PASSED [0.5264s] [ 64%]
2025-12-04T10:52:01.1745108Z inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_channels_last PASSED [0.2339s] [ 67%]
2025-12-04T10:52:01.1745505Z inductor/test_cuda_repro.py::CudaReproTests::test_split_reduction_transposed PASSED [0.0896s] [ 71%]
2025-12-04T10:52:01.1745845Z inductor/test_cuda_repro.py::CudaReproTests::test_triton_interpret PASSED [13.4658s] [ 75%]
2025-12-04T10:52:01.1746267Z inductor/test_cuda_repro.py::CudaReproTests::test_truediv_base_not_bitwise_equivalent PASSED [0.4508s] [ 78%]
2025-12-04T10:52:01.1746691Z inductor/test_cuda_repro.py::CudaReproTests::test_truediv_emulate_divison_rounding PASSED [2.3861s] [ 82%]
2025-12-04T10:52:01.1747010Z inductor/test_cuda_repro.py::CudaReproTests::test_uint_view_copy PASSED [0.0849s] [ 85%]
2025-12-04T10:52:01.1747379Z inductor/test_cuda_repro.py::CudaReproTests::test_unspec_inputs_interop PASSED [0.8697s] [ 89%]
2025-12-04T10:52:01.1747763Z inductor/test_cuda_repro.py::CudaReproTests::test_unused_cpu_input_cudagraphs PASSED [0.3046s] [ 92%]
2025-12-04T10:52:01.1748164Z inductor/test_cuda_repro.py::CudaReproTests::test_view_replay_padding_issue_163328 PASSED [0.6225s] [ 96%]
2025-12-04T10:52:01.1748531Z inductor/test_cuda_repro.py::CudaReproTests::test_xlnet_lm_stride_repro PASSED [0.5718s] [100%]
2025-12-04T10:52:01.1748536Z 
2025-12-04T10:52:01.1757085Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml -
2025-12-04T10:52:01.1757308Z ====================== 28 passed, 68 deselected in 33.94s ======================
2025-12-04T10:52:01.1758808Z The following tests failed consistently: ['test/inductor/test_cuda_repro.py::CudaReproTests::test_dont_inplace_disjoint_accesses', 'test/inductor/test_cuda_repro.py::CudaReproTests::test_effn_attn_bias_padding_misaligned', 'test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile']
2025-12-04T10:52:01.1758821Z 
2025-12-04T10:52:01.1759339Z FINISHED PRINTING LOG FILE of inductor/test_cuda_repro 1/1 (test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log)
2025-12-04T10:52:01.1759345Z 
2025-12-04T10:52:01.1759841Z Finished inductor/test_cuda_repro 1/1 ... [2025-12-04 10:52:00.889853][5878.499753263], took 5.02min
2025-12-04T10:52:01.1760608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml
2025-12-04T10:52:01.1761418Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml
2025-12-04T10:52:01.1762325Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml
2025-12-04T10:52:01.1763098Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml
2025-12-04T10:52:01.1763868Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml
2025-12-04T10:52:01.1764617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml
2025-12-04T10:52:01.1765378Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml
2025-12-04T10:52:01.1822892Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml
2025-12-04T10:52:01.2121505Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml
2025-12-04T10:52:01.2475120Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml
2025-12-04T10:52:01.5867737Z Uploading logs for 57119749427 to S3
2025-12-04T10:52:01.6405488Z Uploading artifacts took 0.37 seconds
2025-12-04T10:52:01.6405919Z inductor/test_cuda_repro 1/1 failed!
2025-12-04T10:52:01.6410889Z Running inductor/test_cudagraph_trees 1/1 ... [2025-12-04 10:52:01.640889][5879.250796684]
2025-12-04T10:52:01.6411484Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:52:01.6415405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudagraph_trees.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:52:01.641316]
2025-12-04T10:55:24.4406254Z 
2025-12-04T10:55:24.4407193Z PRINTING LOG FILE of inductor/test_cudagraph_trees 1/1 (test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log)
2025-12-04T10:55:24.4408522Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml
2025-12-04T10:55:24.4409424Z ============================= test session starts ==============================
2025-12-04T10:55:24.4410082Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:55:24.4410684Z cachedir: .pytest_cache
2025-12-04T10:55:24.4411393Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:55:24.4412174Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:55:24.4412517Z configfile: pytest.ini
2025-12-04T10:55:24.4413232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:55:24.4414023Z collecting ... collected 166 items
2025-12-04T10:55:24.4414437Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T10:55:24.4493427Z Running 166 items in this shard: test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_grad, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_multiple_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_alias_of_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_output_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_storage_single_weakref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliasing_static_ref, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_amp_cache_disabled, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cleanup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_constant_output, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_conv_benchmark, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_or_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_warmup, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_cpu_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_storage, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_end_recording_early, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_execution_into_recording, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_expanded_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_generation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_frozen_fn, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_function_compiled_multiple_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_condition_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_only, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation_late_free, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_rule, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_foreach_op, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_gc, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_False, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_True, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_log_message, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_view_fallback, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_with_memory_plan_reuse, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_index_put, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_manager_per_device, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mark_step, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_meta_tensor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_child_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_parent_node, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multinomial, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_insert_removal_caching, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_reinplaced, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_output_alias, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_peristed_output_livenes, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_run_simple, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_separate_recordings, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_side_stream_memory_allocation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_single_stream_use, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cpp_wrapper, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_symbolic, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_sparsity, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_storage_access_error, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_constant_mutation, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_trees, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_parameter, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unstable_ptr, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warmup_stream_sync, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_on_pending_backward, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached, test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_workspace_allocation_error, test/inductor/test_cudagraph_trees.py::TestSAC::test_cpu_and_cuda_rng, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraph_uneven_forward_backward, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal_device_one, test/inductor/test_cudagraph_trees.py::TestSAC::test_graph_partition_cudagraphs_aot_eager_compat_equal, test/inductor/test_cudagraph_trees.py::TestSAC::test_multi_device, test/inductor/test_cudagraph_trees.py::TestSAC::test_retain_graph, test/inductor/test_cudagraph_trees.py::TestSAC::test_simple, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order0, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order1, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order2, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order3, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order4, test/inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order5
2025-12-04T10:55:24.4570964Z 
2025-12-04T10:55:24.4571398Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_grad PASSED [4.7031s] [  0%]
2025-12-04T10:55:24.4572411Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_accumulate_multiple_recordings PASSED [1.5793s] [  1%]
2025-12-04T10:55:24.4573410Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_alias_of_parameter PASSED [0.3948s] [  1%]
2025-12-04T10:55:24.4574539Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_output_checkpoint PASSED [0.1994s] [  2%]
2025-12-04T10:55:24.4575562Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_static_parameter PASSED [0.1922s] [  3%]
2025-12-04T10:55:24.4577161Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliased_storage_single_weakref W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] Graph break from `Tensor.item()`, consider setting:
2025-12-04T10:55:24.4578821Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0]     torch._dynamo.config.capture_scalar_outputs = True
2025-12-04T10:55:24.4579852Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] or:
2025-12-04T10:55:24.4580803Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0]     env TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1
2025-12-04T10:55:24.4581971Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] to include these operations in the captured graph.
2025-12-04T10:55:24.4582933Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 
2025-12-04T10:55:24.4583805Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] Graph break: from user code at:
2025-12-04T10:55:24.4585307Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0]   File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 1280, in torch_dynamo_resume_in_foo_at_1278
2025-12-04T10:55:24.4586748Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0]     x_alias2 = x[ind:]
2025-12-04T10:55:24.4587588Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 
2025-12-04T10:55:24.4588307Z W1204 10:52:18.072000 84145 site-packages/torch/_dynamo/variables/tensor.py:1073] [1/0] 
2025-12-04T10:55:24.4588853Z PASSED [0.4177s] [  3%]
2025-12-04T10:55:24.4589882Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_aliasing_static_ref W1204 10:52:19.384000 84145 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:55:24.4590953Z PASSED [1.5164s] [  4%]
2025-12-04T10:55:24.4591527Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_amp_cache_disabled PASSED [0.7626s] [  4%]
2025-12-04T10:55:24.4592541Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_backward_gets_cached_cudagraphs PASSED [1.8677s] [  5%]
2025-12-04T10:55:24.4593605Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cache_hit_forward_miss_backward PASSED [1.7867s] [  6%]
2025-12-04T10:55:24.4594817Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_boxed_forward_device_index SKIPPED [0.0004s] (requires multiple cuda devices) [  6%]
2025-12-04T10:55:24.4596019Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cached_forward_backward PASSED [1.3256s] [  7%]
2025-12-04T10:55:24.4597122Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpoint_shared_output_storage_deallocation PASSED [0.2036s] [  7%]
2025-12-04T10:55:24.4598282Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_checkpointing_resets_persistent_refs PASSED [0.4354s] [  8%]
2025-12-04T10:55:24.4599259Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cleanup PASSED [0.6753s] [  9%]
2025-12-04T10:55:24.4600236Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_compiled_autograd_static_input_params PASSED [1.0392s] [  9%]
2025-12-04T10:55:24.4601405Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_constant_output PASSED [0.7132s] [ 10%]
2025-12-04T10:55:24.4602361Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_conv_benchmark PASSED [2.0846s] [ 10%]
2025-12-04T10:55:24.4603249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cpp_wrapper PASSED [2.3583s] [ 11%]
2025-12-04T10:55:24.4604302Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes PASSED [1.0990s] [ 12%]
2025-12-04T10:55:24.4605309Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes1 PASSED [0.5442s] [ 12%]
2025-12-04T10:55:24.4606311Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_capture_sizes2 PASSED [0.5570s] [ 13%]
2025-12-04T10:55:24.4607323Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_cudagraph_or_error PASSED [0.3805s] [ 13%]
2025-12-04T10:55:24.4608267Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_backward PASSED [1.5615s] [ 14%]
2025-12-04T10:55:24.4609232Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_dynamic_warmup PASSED [0.2247s] [ 15%]
2025-12-04T10:55:24.4610148Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_cpu_tensor PASSED [0.4109s] [ 15%]
2025-12-04T10:55:24.4611048Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_empty_storage PASSED [0.7172s] [ 16%]
2025-12-04T10:55:24.4611971Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_end_recording_early PASSED [0.7367s] [ 16%]
2025-12-04T10:55:24.4612925Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use PASSED [0.3945s] [ 17%]
2025-12-04T10:55:24.4613885Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_error_on_dealloc_use2 PASSED [0.3941s] [ 18%]
2025-12-04T10:55:24.4614853Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_execution_into_recording PASSED [0.7579s] [ 18%]
2025-12-04T10:55:24.4615815Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_expanded_inputs PASSED [0.4273s] [ 19%]
2025-12-04T10:55:24.4616873Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times PASSED [0.4838s] [ 19%]
2025-12-04T10:55:24.4618224Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_due_to_cudagraph_managed_tensor PASSED [0.5743s] [ 20%]
2025-12-04T10:55:24.4619627Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_fallback_to_eager_if_recompiling_too_many_times_warn_only_once PASSED [0.4842s] [ 21%]
2025-12-04T10:55:24.4620761Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward PASSED [0.8052s] [ 21%]
2025-12-04T10:55:24.4621829Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_cudagraphs PASSED [0.4135s] [ 22%]
2025-12-04T10:55:24.4623026Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_backward_not_called_backend_inductor PASSED [0.6667s] [ 22%]
2025-12-04T10:55:24.4624080Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_generation PASSED [0.8775s] [ 23%]
2025-12-04T10:55:24.4625128Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_forward_with_skipped_cudagraphed_backward PASSED [0.5242s] [ 24%]
2025-12-04T10:55:24.4626129Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_frozen_fn PASSED [0.3914s] [ 24%]
2025-12-04T10:55:24.4627095Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_function_compiled_multiple_times PASSED [0.6896s] [ 25%]
2025-12-04T10:55:24.4628455Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition W1204 10:52:47.385000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4629736Z W1204 10:52:47.387000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4630366Z PASSED [1.0857s] [ 25%]
2025-12-04T10:55:24.4631361Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_buffer_reuse W1204 10:52:48.525000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4632682Z W1204 10:52:48.527000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4633589Z W1204 10:52:48.531000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4634552Z W1204 10:52:48.533000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4635179Z PASSED [1.1736s] [ 26%]
2025-12-04T10:55:24.4635796Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_condition_op PASSED [1.0909s] [ 27%]
2025-12-04T10:55:24.4636860Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_only PASSED [1.7430s] [ 27%]
2025-12-04T10:55:24.4638309Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_op_and_dynamic_shapes W1204 10:52:52.484000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4639729Z W1204 10:52:52.486000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4640621Z W1204 10:52:53.443000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4641527Z W1204 10:52:53.445000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4642150Z PASSED [2.2289s] [ 28%]
2025-12-04T10:55:24.4643186Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar1 W1204 10:52:54.632000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4644233Z PASSED [0.9696s] [ 28%]
2025-12-04T10:55:24.4645216Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar2 W1204 10:52:55.609000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4646261Z PASSED [0.9850s] [ 29%]
2025-12-04T10:55:24.4647230Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar3 W1204 10:52:56.599000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4648267Z PASSED [0.9807s] [ 30%]
2025-12-04T10:55:24.4649249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar4 W1204 10:52:57.580000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4650285Z PASSED [0.9836s] [ 30%]
2025-12-04T10:55:24.4651303Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_device_put W1204 10:52:58.558000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4652681Z W1204 10:52:58.559000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4653306Z PASSED [0.9402s] [ 31%]
2025-12-04T10:55:24.4654331Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_multiple W1204 10:52:59.506000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4655401Z PASSED [0.9976s] [ 31%]
2025-12-04T10:55:24.4656422Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_scalar_mutation W1204 10:53:00.497000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4657793Z W1204 10:53:00.500000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4658419Z PASSED [0.9565s] [ 32%]
2025-12-04T10:55:24.4659060Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_cpu_tensor_symints PASSED [2.0533s] [ 33%]
2025-12-04T10:55:24.4660109Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op PASSED [0.6010s] [ 33%]
2025-12-04T10:55:24.4661200Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_dynamoc_shapes PASSED [0.8776s] [ 34%]
2025-12-04T10:55:24.4662328Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation PASSED [0.4727s] [ 34%]
2025-12-04T10:55:24.4663452Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_mutation_late_free PASSED [0.5923s] [ 35%]
2025-12-04T10:55:24.4664685Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_op_no_split PASSED [0.7775s] [ 36%]
2025-12-04T10:55:24.4665743Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_custom_rule PASSED [0.9076s] [ 36%]
2025-12-04T10:55:24.4667223Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_scalar_inputs W1204 10:53:07.928000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4668628Z W1204 10:53:07.930000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4669537Z W1204 10:53:08.783000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4670437Z W1204 10:53:08.786000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4671064Z PASSED [1.5044s] [ 37%]
2025-12-04T10:55:24.4671703Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_dynamic_shapes PASSED [0.5995s] [ 37%]
2025-12-04T10:55:24.4672747Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_foreach_op PASSED [0.4453s] [ 38%]
2025-12-04T10:55:24.4674178Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward W1204 10:53:10.394000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4675535Z W1204 10:53:10.399000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4676150Z PASSED [1.3426s] [ 39%]
2025-12-04T10:55:24.4676847Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_backward_not_called PASSED [0.6847s] [ 39%]
2025-12-04T10:55:24.4678090Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_forward_with_skipped_cudagraphed_backward PASSED [0.5465s] [ 40%]
2025-12-04T10:55:24.4679311Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_fused_scheduler_node PASSED [0.4658s] [ 40%]
2025-12-04T10:55:24.4680329Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_gc PASSED [0.6708s] [ 41%]
2025-12-04T10:55:24.4681286Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_item PASSED [0.4352s] [ 42%]
2025-12-04T10:55:24.4682854Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_False W1204 10:53:14.489000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4684345Z W1204 10:53:14.491000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4685238Z W1204 10:53:14.492000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4685864Z PASSED [1.0453s] [ 42%]
2025-12-04T10:55:24.4686999Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_kernel_reuse_autotune_at_compile_time_True W1204 10:53:15.531000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4688461Z W1204 10:53:15.533000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4689355Z W1204 10:53:15.534000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4689977Z PASSED [1.0688s] [ 43%]
2025-12-04T10:55:24.4690605Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_log_message PASSED [0.9958s] [ 43%]
2025-12-04T10:55:24.4691827Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_multiple_devices_msg SKIPPED [0.0003s] (requires multiple cuda devices) [ 44%]
2025-12-04T10:55:24.4693631Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reduce_overhead_mode_effectiveness W1204 10:53:17.656000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4695077Z W1204 10:53:17.658000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4695699Z PASSED [1.0819s] [ 45%]
2025-12-04T10:55:24.4696406Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu PASSED [1.1381s] [ 45%]
2025-12-04T10:55:24.4697918Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_cpu_and_gpu_interleave W1204 10:53:19.938000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4699085Z PASSED [1.2184s] [ 46%]
2025-12-04T10:55:24.4699833Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency PASSED [0.8235s] [ 46%]
2025-12-04T10:55:24.4701249Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_reorder_custom_op_with_no_dependency1 PASSED [0.8916s] [ 47%]
2025-12-04T10:55:24.4702757Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_simple W1204 10:53:22.818000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4704071Z W1204 10:53:22.820000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4704702Z PASSED [1.1522s] [ 48%]
2025-12-04T10:55:24.4705653Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint W1204 10:53:23.973000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4706963Z W1204 10:53:23.975000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4707871Z W1204 10:53:24.891000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4708772Z W1204 10:53:24.893000 84145 site-packages/torch/_inductor/utils.py:2565] [0/1] DeviceCopy in input program
2025-12-04T10:55:24.4709380Z PASSED [2.0956s] [ 48%]
2025-12-04T10:55:24.4710050Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_cat_backward PASSED [1.5954s] [ 49%]
2025-12-04T10:55:24.4711186Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_mutation_index PASSED [0.8360s] [ 50%]
2025-12-04T10:55:24.4712409Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_symint_from_nested_indirect_indexing PASSED [0.6498s] [ 50%]
2025-12-04T10:55:24.4713941Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint W1204 10:53:29.215000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4715300Z W1204 10:53:29.217000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4715920Z PASSED [1.1823s] [ 51%]
2025-12-04T10:55:24.4716666Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_unbacked_symint_multi_output_layout PASSED [0.9958s] [ 51%]
2025-12-04T10:55:24.4718263Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:31.629000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4719696Z W1204 10:53:31.631000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4720349Z ('RERUN', {'yellow': True}) [1.4081s] [ 52%]
2025-12-04T10:55:24.4721521Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:32.756000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4723009Z W1204 10:53:32.758000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4723651Z ('RERUN', {'yellow': True}) [1.3146s] [ 52%]
2025-12-04T10:55:24.4724960Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:34.075000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4726392Z W1204 10:53:34.076000 84145 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4727045Z FAILED [1.3174s] [ 52%]
2025-12-04T10:55:24.4727761Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse ERROR [0.0001s] [ 52%]
2025-12-04T10:55:24.4728502Z 
2025-12-04T10:55:24.4728646Z ==================================== RERUNS ====================================
2025-12-04T10:55:24.4729242Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___
2025-12-04T10:55:24.4729788Z Traceback (most recent call last):
2025-12-04T10:55:24.4730642Z   File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4731533Z     self.assertEqual(eager_out, compiled_out)
2025-12-04T10:55:24.4732258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T10:55:24.4732996Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T10:55:24.4733819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T10:55:24.4734687Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T10:55:24.4735154Z AssertionError: Tensor-likes are not close!
2025-12-04T10:55:24.4735430Z 
2025-12-04T10:55:24.4735549Z Mismatched elements: 64 / 128 (50.0%)
2025-12-04T10:55:24.4736105Z Greatest absolute difference: 2.7803521156311035 at index (65,) (up to 1e-05 allowed)
2025-12-04T10:55:24.4736801Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed)
2025-12-04T10:55:24.4737206Z 
2025-12-04T10:55:24.4737330Z The failure occurred for item [0]
2025-12-04T10:55:24.4737567Z 
2025-12-04T10:55:24.4737780Z To execute this test, run the following from the base repo dir:
2025-12-04T10:55:24.4738795Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4739584Z 
2025-12-04T10:55:24.4739848Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:55:24.4740477Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4740955Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4741331Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4741918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4743019Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4743913Z graph_break []
2025-12-04T10:55:24.4744270Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4744819Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4745514Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4746151Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4746382Z 
2025-12-04T10:55:24.4746516Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4746973Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4747672Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4748289Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4748508Z 
2025-12-04T10:55:24.4748635Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4749265Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___
2025-12-04T10:55:24.4749839Z Traceback (most recent call last):
2025-12-04T10:55:24.4750668Z   File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4751594Z     self.assertEqual(eager_out, compiled_out)
2025-12-04T10:55:24.4752323Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T10:55:24.4753055Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T10:55:24.4753899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T10:55:24.4754768Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T10:55:24.4755249Z AssertionError: Tensor-likes are not close!
2025-12-04T10:55:24.4755510Z 
2025-12-04T10:55:24.4755628Z Mismatched elements: 64 / 128 (50.0%)
2025-12-04T10:55:24.4756179Z Greatest absolute difference: 2.7356221675872803 at index (90,) (up to 1e-05 allowed)
2025-12-04T10:55:24.4756891Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed)
2025-12-04T10:55:24.4757279Z 
2025-12-04T10:55:24.4757413Z The failure occurred for item [0]
2025-12-04T10:55:24.4757638Z 
2025-12-04T10:55:24.4757848Z To execute this test, run the following from the base repo dir:
2025-12-04T10:55:24.4758849Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4759635Z 
2025-12-04T10:55:24.4759910Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:55:24.4760529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4760983Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4761357Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4761952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4763091Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4763973Z graph_break []
2025-12-04T10:55:24.4764340Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4764884Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4765557Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4766311Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4766545Z 
2025-12-04T10:55:24.4766687Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4767126Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4767825Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4768456Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4768665Z 
2025-12-04T10:55:24.4768809Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4769259Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4769732Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4770103Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4770683Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4771777Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4772657Z graph_break []
2025-12-04T10:55:24.4773026Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4773549Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4774341Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4774986Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4775215Z 
2025-12-04T10:55:24.4775345Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4775830Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4776525Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4777148Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4777354Z 
2025-12-04T10:55:24.4777521Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4777912Z ==================================== ERRORS ====================================
2025-12-04T10:55:24.4778567Z _ ERROR at teardown of CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse _
2025-12-04T10:55:24.4779189Z Traceback (most recent call last):
2025-12-04T10:55:24.4779805Z   File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 174, in tearDown
2025-12-04T10:55:24.4780482Z     self.assertEqual(all_live_block_count(), 0)
2025-12-04T10:55:24.4781215Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T10:55:24.4781955Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T10:55:24.4782773Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T10:55:24.4783641Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T10:55:24.4784106Z AssertionError: Scalars are not equal!
2025-12-04T10:55:24.4784353Z 
2025-12-04T10:55:24.4784457Z Expected 0 but got 2.
2025-12-04T10:55:24.4784741Z Absolute difference: 2
2025-12-04T10:55:24.4785029Z Relative difference: inf
2025-12-04T10:55:24.4785425Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4785897Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4786271Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4786849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4787944Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4788826Z graph_break []
2025-12-04T10:55:24.4789200Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4789723Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4790403Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4791035Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4791265Z 
2025-12-04T10:55:24.4791391Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4791995Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4792702Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4793321Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4793529Z 
2025-12-04T10:55:24.4793658Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4794129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4794601Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4794958Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4795557Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4796652Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4797534Z graph_break []
2025-12-04T10:55:24.4797895Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4798531Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4799214Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4799855Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4800126Z 
2025-12-04T10:55:24.4800256Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4800713Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4801572Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4802651Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4802877Z 
2025-12-04T10:55:24.4803007Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4803473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4803925Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4804298Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4804897Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4805986Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4806855Z graph_break []
2025-12-04T10:55:24.4807219Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4807754Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4808436Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4809053Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4809295Z 
2025-12-04T10:55:24.4809424Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4809868Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4810548Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4811171Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4811377Z 
2025-12-04T10:55:24.4811516Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4811911Z =================================== FAILURES ===================================
2025-12-04T10:55:24.4812497Z ___ CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse ___
2025-12-04T10:55:24.4813065Z Traceback (most recent call last):
2025-12-04T10:55:24.4813909Z   File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4171, in test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4814790Z     self.assertEqual(eager_out, compiled_out)
2025-12-04T10:55:24.4815516Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T10:55:24.4816268Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T10:55:24.4817088Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T10:55:24.4817942Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T10:55:24.4818420Z AssertionError: Tensor-likes are not close!
2025-12-04T10:55:24.4818684Z 
2025-12-04T10:55:24.4818818Z Mismatched elements: 64 / 128 (50.0%)
2025-12-04T10:55:24.4819347Z Greatest absolute difference: 2.709859848022461 at index (126,) (up to 1e-05 allowed)
2025-12-04T10:55:24.4820051Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed)
2025-12-04T10:55:24.4820450Z 
2025-12-04T10:55:24.4820566Z The failure occurred for item [0]
2025-12-04T10:55:24.4820789Z 
2025-12-04T10:55:24.4821010Z To execute this test, run the following from the base repo dir:
2025-12-04T10:55:24.4821993Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4822792Z 
2025-12-04T10:55:24.4823169Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:55:24.4823800Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4824273Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4824633Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4825282Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4826377Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4827286Z graph_break []
2025-12-04T10:55:24.4827645Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4828185Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4828868Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4829490Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4829733Z 
2025-12-04T10:55:24.4829861Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4830305Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4831001Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4831616Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4831837Z 
2025-12-04T10:55:24.4831964Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4832423Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4832879Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4833247Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4833836Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4834931Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4835801Z graph_break []
2025-12-04T10:55:24.4836169Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4836699Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4837364Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4837999Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4838238Z 
2025-12-04T10:55:24.4838368Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4838816Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4839505Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4840126Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4840332Z 
2025-12-04T10:55:24.4840476Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4840928Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T10:55:24.4841392Z frames [('total', 1), ('ok', 1)]
2025-12-04T10:55:24.4841767Z stats [('calls_captured', 7), ('unique_graphs', 1)]
2025-12-04T10:55:24.4842461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T10:55:24.4843550Z inductor [('triton_bundler_save_kernel', 8), ('extern_calls', 4), ('pattern_matcher_count', 2), ('pattern_matcher_nodes', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1)]
2025-12-04T10:55:24.4844443Z graph_break []
2025-12-04T10:55:24.4844818Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T10:55:24.4845345Z cudagraph partition due to non gpu ops. Found from : 
2025-12-04T10:55:24.4846026Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4159, in foo
2025-12-04T10:55:24.4846659Z     output1_cpu = output1.cpu() + 1
2025-12-04T10:55:24.4846887Z 
2025-12-04T10:55:24.4847026Z cudagraph partition due to non gpu ops
2025-12-04T10:55:24.4847549Z cudagraph partition due to DeviceCopy ops. Found from : 
2025-12-04T10:55:24.4848251Z    File "/var/lib/jenkins/workspace/test/inductor/test_cudagraph_trees.py", line 4161, in foo
2025-12-04T10:55:24.4848910Z     x2 = output1_cpu.to("cuda")
2025-12-04T10:55:24.4849116Z 
2025-12-04T10:55:24.4849247Z cudagraph partition into 3 partitions
2025-12-04T10:55:24.4850260Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml -
2025-12-04T10:55:24.4851395Z =========================== short test summary info ============================
2025-12-04T10:55:24.4852463Z FAILED [1.3174s] inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse - AssertionError: Tensor-likes are not close!
2025-12-04T10:55:24.4853309Z 
2025-12-04T10:55:24.4853431Z Mismatched elements: 64 / 128 (50.0%)
2025-12-04T10:55:24.4853983Z Greatest absolute difference: 2.709859848022461 at index (126,) (up to 1e-05 allowed)
2025-12-04T10:55:24.4854692Z Greatest relative difference: inf at index (64,) (up to 1.3e-06 allowed)
2025-12-04T10:55:24.4855082Z 
2025-12-04T10:55:24.4855216Z The failure occurred for item [0]
2025-12-04T10:55:24.4855441Z 
2025-12-04T10:55:24.4855652Z To execute this test, run the following from the base repo dir:
2025-12-04T10:55:24.4856666Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cudagraph_trees.py CudaGraphTreeTests.test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4857469Z 
2025-12-04T10:55:24.4857732Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T10:55:24.4858828Z ERROR [0.0001s] inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse - AssertionError: Scalars are not equal!
2025-12-04T10:55:24.4859651Z 
2025-12-04T10:55:24.4859760Z Expected 0 but got 2.
2025-12-04T10:55:24.4860055Z Absolute difference: 2
2025-12-04T10:55:24.4860360Z Relative difference: inf
2025-12-04T10:55:24.4860740Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 2 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T10:55:24.4861285Z ===== 1 failed, 84 passed, 2 skipped, 1 error, 2 rerun in 84.50s (0:01:24) =====
2025-12-04T10:55:24.4861759Z Got exit code 1
2025-12-04T10:55:24.4862031Z Retrying single test...
2025-12-04T10:55:24.4862789Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml
2025-12-04T10:55:24.4863683Z ============================= test session starts ==============================
2025-12-04T10:55:24.4864339Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:55:24.4864924Z cachedir: .pytest_cache
2025-12-04T10:55:24.4865604Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:55:24.4866374Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:55:24.4866717Z configfile: pytest.ini
2025-12-04T10:55:24.4867419Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:55:24.4868295Z collecting ... collected 166 items / 165 deselected / 1 selected
2025-12-04T10:55:24.4869375Z stepcurrent: skipping 86 already run items. Running only test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse
2025-12-04T10:55:24.4870354Z Running 1 items in this shard
2025-12-04T10:55:24.4870559Z 
2025-12-04T10:55:24.4871473Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse W1204 10:53:51.762000 86562 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4872902Z W1204 10:53:51.764000 86562 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4873599Z PASSED [6.1242s] [100%]
2025-12-04T10:55:24.4873775Z 
2025-12-04T10:55:24.4874553Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml -
2025-12-04T10:55:24.4875662Z ====================== 1 passed, 165 deselected in 6.16s =======================
2025-12-04T10:55:24.4876091Z Got exit code 0
2025-12-04T10:55:24.4876493Z Test succeeded in new process, continuing with the rest of the tests
2025-12-04T10:55:24.4877564Z Test results will be stored in test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml
2025-12-04T10:55:24.4878443Z ============================= test session starts ==============================
2025-12-04T10:55:24.4879089Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T10:55:24.4879676Z cachedir: .pytest_cache
2025-12-04T10:55:24.4880366Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T10:55:24.4881131Z rootdir: /var/lib/jenkins/workspace
2025-12-04T10:55:24.4881476Z configfile: pytest.ini
2025-12-04T10:55:24.4882251Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T10:55:24.4883128Z collecting ... collected 166 items / 87 deselected / 79 selected
2025-12-04T10:55:24.4883624Z stepcurrent: skipping 87 already run items.
2025-12-04T10:55:24.4884010Z Running 79 items in this shard
2025-12-04T10:55:24.4884218Z 
2025-12-04T10:55:24.4885051Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_view_fallback W1204 10:54:12.048000 86846 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4886402Z W1204 10:54:12.049000 86846 site-packages/torch/_inductor/utils.py:2565] [0/0] DeviceCopy in input program
2025-12-04T10:55:24.4887039Z PASSED [4.9775s] [  1%]
2025-12-04T10:55:24.4888140Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_with_memory_plan_reuse W1204 10:54:14.125000 86846 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T10:55:24.4889282Z PASSED [2.3079s] [  2%]
2025-12-04T10:55:24.4889934Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_item PASSED [0.2993s] [  3%]
2025-12-04T10:55:24.4891026Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero PASSED [0.3663s] [  5%]
2025-12-04T10:55:24.4892173Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_backend PASSED [0.2749s] [  6%]
2025-12-04T10:55:24.4893370Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_incompatible_cudagraph_ops_nonzero_graph_breaks PASSED [0.6579s] [  7%]
2025-12-04T10:55:24.4894413Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_index_put PASSED [0.6848s] [  8%]
2025-12-04T10:55:24.4895360Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_live_outputs_multiple_graphs PASSED [1.1972s] [ 10%]
2025-12-04T10:55:24.4896507Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_manager_per_device SKIPPED [0.0004s] (requires multiple cuda devices) [ 11%]
2025-12-04T10:55:24.4897553Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mark_step PASSED [0.6834s] [ 12%]
2025-12-04T10:55:24.4898418Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_meta_tensor PASSED [0.6780s] [ 13%]
2025-12-04T10:55:24.4899359Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_child_node PASSED [1.0923s] [ 15%]
2025-12-04T10:55:24.4900378Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module PASSED [0.8239s] [ 16%]
2025-12-04T10:55:24.4901700Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_custom_module_buffer PASSED [0.9133s] [ 17%]
2025-12-04T10:55:24.4902757Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_parent_node PASSED [1.1140s] [ 18%]
2025-12-04T10:55:24.4903857Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module PASSED [0.6419s] [ 20%]
2025-12-04T10:55:24.4905127Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_builtin_module_buffers PASSED [0.9013s] [ 21%]
2025-12-04T10:55:24.4906331Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multi_dispatch_single_compile_param_inputs PASSED [0.4832s] [ 22%]
2025-12-04T10:55:24.4908734Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multinomial SKIPPED [0.0009s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/166682 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 24%]
2025-12-04T10:55:24.4911221Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_cudagraphs SKIPPED [0.0002s] (requires multiple cuda devices) [ 25%]
2025-12-04T10:55:24.4912679Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_devices_msg_backend_inductor SKIPPED [0.0002s] (requires multiple cuda devices) [ 26%]
2025-12-04T10:55:24.4913935Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_multiple_insert_removal_caching PASSED [0.1995s] [ 27%]
2025-12-04T10:55:24.4915122Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_cudagraphs PASSED [0.3128s] [ 29%]
2025-12-04T10:55:24.4916435Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_backend_inductor PASSED [0.5533s] [ 30%]
2025-12-04T10:55:24.4917788Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_cudagraphs PASSED [0.3130s] [ 31%]
2025-12-04T10:55:24.4919182Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensor_warn_only_once_backend_inductor PASSED [0.5585s] [ 32%]
2025-12-04T10:55:24.4920500Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_cudagraphs PASSED [0.3307s] [ 34%]
2025-12-04T10:55:24.4921776Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_backend_inductor PASSED [0.5413s] [ 35%]
2025-12-04T10:55:24.4923157Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs PASSED [0.3225s] [ 36%]
2025-12-04T10:55:24.4924507Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_inductor PASSED [0.5309s] [ 37%]
2025-12-04T10:55:24.4925725Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_cudagraphs PASSED [0.3340s] [ 39%]
2025-12-04T10:55:24.4926805Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_on_inp_backend_inductor PASSED [0.5975s] [ 40%]
2025-12-04T10:55:24.4927827Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_reinplaced PASSED [0.4281s] [ 41%]
2025-12-04T10:55:24.4928859Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_no_rerecord_with_mark_static_address PASSED [0.8361s] [ 43%]
2025-12-04T10:55:24.4930063Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_not_fallback_to_eager_if_have_not_recompiling_too_many_times PASSED [0.4819s] [ 44%]
2025-12-04T10:55:24.4931155Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_output_alias PASSED [0.2144s] [ 45%]
2025-12-04T10:55:24.4932096Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_peristed_output_livenes PASSED [0.3698s] [ 46%]
2025-12-04T10:55:24.4933116Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_remove_hooks_on_cached_tensors PASSED [0.4224s] [ 48%]
2025-12-04T10:55:24.4934271Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rerecord_if_static_input_address_changed PASSED [0.5926s] [ 49%]
2025-12-04T10:55:24.4935291Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_non_trees PASSED [0.3138s] [ 50%]
2025-12-04T10:55:24.4936190Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_rng_trees PASSED [0.3055s] [ 51%]
2025-12-04T10:55:24.4937045Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_run_simple PASSED [0.7736s] [ 53%]
2025-12-04T10:55:24.4937975Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_separate_recordings PASSED [0.6962s] [ 54%]
2025-12-04T10:55:24.4938975Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_side_stream_memory_allocation PASSED [0.2207s] [ 55%]
2025-12-04T10:55:24.4939965Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_single_stream_use PASSED [0.5726s] [ 56%]
2025-12-04T10:55:24.4940911Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cpp_wrapper PASSED [2.0238s] [ 58%]
2025-12-04T10:55:24.4941869Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_cudagraph_unsafe_ops PASSED [0.4294s] [ 59%]
2025-12-04T10:55:24.4942928Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached1 PASSED [1.1930s] [ 60%]
2025-12-04T10:55:24.4944036Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_if_dynamic_shape_limit_reached2 PASSED [11.5451s] [ 62%]
2025-12-04T10:55:24.4945035Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_skip_symbolic PASSED [0.4468s] [ 63%]
2025-12-04T10:55:24.4945893Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_sparsity PASSED [0.3235s] [ 64%]
2025-12-04T10:55:24.4946861Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_static_inputs_address_mutation_log PASSED [0.6451s] [ 65%]
2025-12-04T10:55:24.4947892Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_storage_access_error PASSED [0.2531s] [ 67%]
2025-12-04T10:55:24.4948881Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_constant_mutation PASSED [0.4722s] [ 68%]
2025-12-04T10:55:24.4949896Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_dies_between_checkpoint PASSED [0.2644s] [ 69%]
2025-12-04T10:55:24.4950937Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_tensor_no_longer_in_pool PASSED [0.2688s] [ 70%]
2025-12-04T10:55:24.4952001Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_no_cudagraphs PASSED [0.3517s] [ 72%]
2025-12-04T10:55:24.4953109Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_non_trees PASSED [0.3401s] [ 73%]
2025-12-04T10:55:24.4954155Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_input_trees PASSED [0.3427s] [ 74%]
2025-12-04T10:55:24.4955192Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unaligned_static_parameter PASSED [0.2590s] [ 75%]
2025-12-04T10:55:24.4956154Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_unstable_ptr PASSED [0.4235s] [ 77%]
2025-12-04T10:55:24.4957073Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warmup_stream_sync PASSED [5.3250s] [ 78%]
2025-12-04T10:55:24.4958037Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_on_pending_backward PASSED [0.4443s] [ 79%]
2025-12-04T10:55:24.4959098Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_warn_once_if_dynamic_shape_limit_reached PASSED [1.2814s] [ 81%]
2025-12-04T10:55:24.4960640Z inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_workspace_allocation_error [W1204 10:55:03.283122954 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T10:55:24.4961784Z PASSED [16.3481s] [ 82%]
2025-12-04T10:55:24.4962345Z inductor/test_cudagraph_trees.py::TestSAC::test_cpu_and_cuda_rng PASSED [0.1776s] [ 83%]
2025-12-04T10:55:24.4963227Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraph_uneven_forward_backward PASSED [0.0051s] [ 84%]
2025-12-04T10:55:24.4965561Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal SKIPPED [0.0007s] (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/163852 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 86%]
2025-12-04T10:55:24.4968056Z inductor/test_cudagraph_trees.py::TestSAC::test_cudagraphs_aot_eager_compat_equal_device_one SKIPPED [0.0002s] (requires multiple cuda devices) [ 87%]
2025-12-04T10:55:24.4969299Z inductor/test_cudagraph_trees.py::TestSAC::test_graph_partition_cudagraphs_aot_eager_compat_equal PASSED [0.6589s] [ 88%]
2025-12-04T10:55:24.4970382Z inductor/test_cudagraph_trees.py::TestSAC::test_multi_device SKIPPED [0.0003s] (requires multiple cuda devices) [ 89%]
2025-12-04T10:55:24.4971287Z inductor/test_cudagraph_trees.py::TestSAC::test_retain_graph PASSED [0.1200s] [ 91%]
2025-12-04T10:55:24.4972026Z inductor/test_cudagraph_trees.py::TestSAC::test_simple PASSED [0.2320s]  [ 92%]
2025-12-04T10:55:24.4972834Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order0 PASSED [0.1448s] [ 93%]
2025-12-04T10:55:24.4973753Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order1 PASSED [0.1403s] [ 94%]
2025-12-04T10:55:24.4974665Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order2 PASSED [0.1397s] [ 96%]
2025-12-04T10:55:24.4975573Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order3 PASSED [0.1398s] [ 97%]
2025-12-04T10:55:24.4976463Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order4 PASSED [0.1392s] [ 98%]
2025-12-04T10:55:24.4977371Z inductor/test_cudagraph_trees.py::TestSAC::test_uneven_forward_backward_order5 PASSED [0.1393s] [100%]
2025-12-04T10:55:24.4977905Z 
2025-12-04T10:55:24.4978677Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml -
2025-12-04T10:55:24.4979794Z =========== 72 passed, 7 skipped, 87 deselected in 74.13s (0:01:14) ============
2025-12-04T10:55:24.4980933Z The following tests failed and then succeeded when run in a new process['test/inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_graph_partition_user_defined_triton_kernel_reuse']
2025-12-04T10:55:24.4981865Z 
2025-12-04T10:55:24.4982440Z FINISHED PRINTING LOG FILE of inductor/test_cudagraph_trees 1/1 (test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log)
2025-12-04T10:55:24.4983153Z 
2025-12-04T10:55:24.4983514Z Finished inductor/test_cudagraph_trees 1/1 ... [2025-12-04 10:55:24.440612][6082.050521013], took 3.38min
2025-12-04T10:55:24.4984867Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml
2025-12-04T10:55:24.5260750Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml
2025-12-04T10:55:24.5525963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml
2025-12-04T10:55:24.5862176Z Running inductor/test_cuda_select_algorithm 4/5 ... [2025-12-04 10:55:24.586001][6082.195909186]
2025-12-04T10:55:24.5862795Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T10:55:24.5865917Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cuda_select_algorithm.py', '--shard-id=4', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 10:55:24.586371]
2025-12-04T11:11:26.2613299Z 
2025-12-04T11:11:26.2615388Z PRINTING LOG FILE of inductor/test_cuda_select_algorithm 4/5 (test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log)
2025-12-04T11:11:26.2616975Z W1204 10:55:33.511000 88082 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.2619029Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml
2025-12-04T11:11:26.2620165Z ============================= test session starts ==============================
2025-12-04T11:11:26.2620938Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.2621538Z cachedir: .pytest_cache
2025-12-04T11:11:26.2622602Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.2623745Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.2624435Z configfile: pytest.ini
2025-12-04T11:11:26.2625581Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.2626876Z collecting ... collected 58 items
2025-12-04T11:11:26.2627267Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:11:26.2640409Z Running 11 items in this shard: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16, test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.2653858Z 
2025-12-04T11:11:26.2655255Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7033s] [  9%]
2025-12-04T11:11:26.2658027Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.3988s] [  9%]
2025-12-04T11:11:26.2660678Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.3970s] [  9%]
2025-12-04T11:11:26.2661897Z 
2025-12-04T11:11:26.2662103Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.2663476Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2664547Z Traceback (most recent call last):
2025-12-04T11:11:26.2665705Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2666998Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2668112Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2669220Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2672111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2673239Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2673935Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2674356Z 
2025-12-04T11:11:26.2674518Z Expected 1 but got 2.
2025-12-04T11:11:26.2674999Z Absolute difference: 1
2025-12-04T11:11:26.2675431Z Relative difference: 1.0
2025-12-04T11:11:26.2675655Z 
2025-12-04T11:11:26.2676016Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2678007Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2679528Z 
2025-12-04T11:11:26.2679941Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2680875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2681644Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2682690Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2684004Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2684730Z graph_break []
2025-12-04T11:11:26.2685296Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2687114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2688547Z   warnings.warn(
2025-12-04T11:11:26.2689898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2691393Z   warnings.warn(
2025-12-04T11:11:26.2692381Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2693413Z Traceback (most recent call last):
2025-12-04T11:11:26.2694282Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2695273Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2696152Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2696906Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2697733Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2698600Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2699075Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2699320Z 
2025-12-04T11:11:26.2699428Z Expected 1 but got 2.
2025-12-04T11:11:26.2699716Z Absolute difference: 1
2025-12-04T11:11:26.2700009Z Relative difference: 1.0
2025-12-04T11:11:26.2700197Z 
2025-12-04T11:11:26.2700408Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2702041Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2703077Z 
2025-12-04T11:11:26.2703340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2704020Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2704488Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2705223Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2706165Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2706629Z graph_break []
2025-12-04T11:11:26.2706987Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2708077Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2709034Z   warnings.warn(
2025-12-04T11:11:26.2709898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2710844Z   warnings.warn(
2025-12-04T11:11:26.2711221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2711698Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2712125Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2713002Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2713756Z graph_break []
2025-12-04T11:11:26.2714109Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2715182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2716128Z   warnings.warn(
2025-12-04T11:11:26.2717001Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2717926Z   warnings.warn(
2025-12-04T11:11:26.2718234Z =================================== FAILURES ===================================
2025-12-04T11:11:26.2719027Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2719782Z Traceback (most recent call last):
2025-12-04T11:11:26.2720506Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2721365Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2722256Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2723000Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2723824Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2724697Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2725172Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2725418Z 
2025-12-04T11:11:26.2725522Z Expected 1 but got 2.
2025-12-04T11:11:26.2725807Z Absolute difference: 1
2025-12-04T11:11:26.2726100Z Relative difference: 1.0
2025-12-04T11:11:26.2726286Z 
2025-12-04T11:11:26.2726494Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2727808Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2728834Z 
2025-12-04T11:11:26.2729097Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2729715Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2730237Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2730969Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2731884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2732344Z graph_break []
2025-12-04T11:11:26.2732699Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2733773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2734724Z   warnings.warn(
2025-12-04T11:11:26.2735572Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2736516Z   warnings.warn(
2025-12-04T11:11:26.2736893Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2737360Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2737780Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2738668Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2739416Z graph_break []
2025-12-04T11:11:26.2739769Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2740841Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2741889Z   warnings.warn(
2025-12-04T11:11:26.2742876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2743810Z   warnings.warn(
2025-12-04T11:11:26.2744193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2744665Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2745103Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2745970Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2746721Z graph_break []
2025-12-04T11:11:26.2747091Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2748147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2749094Z   warnings.warn(
2025-12-04T11:11:26.2749961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2750904Z   warnings.warn(
2025-12-04T11:11:26.2751875Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml -
2025-12-04T11:11:26.2753011Z =========================== short test summary info ============================
2025-12-04T11:11:26.2754244Z FAILED [0.3970s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2755282Z 
2025-12-04T11:11:26.2755400Z Expected 1 but got 2.
2025-12-04T11:11:26.2755768Z Absolute difference: 1
2025-12-04T11:11:26.2756073Z Relative difference: 1.0
2025-12-04T11:11:26.2756261Z 
2025-12-04T11:11:26.2756494Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2758718Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2759743Z 
2025-12-04T11:11:26.2760044Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2760630Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.2761118Z ========================== 1 failed, 2 rerun in 4.53s ==========================
2025-12-04T11:11:26.2761600Z Got exit code 1
2025-12-04T11:11:26.2761856Z Retrying single test...
2025-12-04T11:11:26.2762489Z W1204 10:55:53.043000 88251 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.2763718Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml
2025-12-04T11:11:26.2764658Z ============================= test session starts ==============================
2025-12-04T11:11:26.2765321Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.2765919Z cachedir: .pytest_cache
2025-12-04T11:11:26.2766628Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.2767394Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.2767756Z configfile: pytest.ini
2025-12-04T11:11:26.2768475Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.2769356Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.2770665Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2771880Z Running 1 items in this shard
2025-12-04T11:11:26.2772085Z 
2025-12-04T11:11:26.2773351Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:55:56.153388380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2774740Z 
2025-12-04T11:11:26.2775265Z [W1204 10:56:12.736282206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2775910Z 
2025-12-04T11:11:26.2776434Z [W1204 10:56:12.736533603 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2777074Z 
2025-12-04T11:11:26.2777573Z [W1204 10:56:12.743718646 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2778224Z 
2025-12-04T11:11:26.2778726Z [W1204 10:56:12.744409162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2779374Z 
2025-12-04T11:11:26.2779925Z [W1204 10:56:12.744596111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2780560Z 
2025-12-04T11:11:26.2781076Z [W1204 10:56:12.751482569 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2781713Z 
2025-12-04T11:11:26.2782334Z [W1204 10:56:12.752162629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2782971Z 
2025-12-04T11:11:26.2783471Z [W1204 10:56:12.752347189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2784162Z 
2025-12-04T11:11:26.2784661Z [W1204 10:56:14.696887963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2785314Z 
2025-12-04T11:11:26.2785818Z [W1204 10:56:14.698621794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2786494Z 
2025-12-04T11:11:26.2786997Z [W1204 10:56:14.698825805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2787631Z 
2025-12-04T11:11:26.2788154Z [W1204 10:56:14.702704605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2788791Z 
2025-12-04T11:11:26.2789305Z [W1204 10:56:14.703335993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2789937Z 
2025-12-04T11:11:26.2790445Z [W1204 10:56:14.703532011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2791091Z 
2025-12-04T11:11:26.2791596Z [W1204 10:56:14.709440101 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2792244Z 
2025-12-04T11:11:26.2792750Z [W1204 10:56:14.710084747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2793386Z 
2025-12-04T11:11:26.2793909Z [W1204 10:56:14.710287979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2794546Z 
2025-12-04T11:11:26.2794691Z ('RERUN', {'yellow': True}) [19.3198s] [100%]
2025-12-04T11:11:26.2796183Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:14.063034583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2797561Z 
2025-12-04T11:11:26.2798067Z [W1204 10:56:14.063798034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2798714Z 
2025-12-04T11:11:26.2799219Z [W1204 10:56:14.063994248 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2799851Z 
2025-12-04T11:11:26.2800360Z [W1204 10:56:14.067847114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2801191Z 
2025-12-04T11:11:26.2801767Z [W1204 10:56:14.068610926 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2802403Z 
2025-12-04T11:11:26.2802907Z [W1204 10:56:14.068798938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2803554Z 
2025-12-04T11:11:26.2804053Z [W1204 10:56:14.074761041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2804701Z 
2025-12-04T11:11:26.2805203Z [W1204 10:56:14.075378078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2805835Z 
2025-12-04T11:11:26.2806353Z [W1204 10:56:14.075562586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2806985Z 
2025-12-04T11:11:26.2807642Z [W1204 10:56:14.163880897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2808279Z 
2025-12-04T11:11:26.2808801Z [W1204 10:56:14.164673307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2809502Z 
2025-12-04T11:11:26.2810004Z [W1204 10:56:14.164883147 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2810697Z 
2025-12-04T11:11:26.2811199Z [W1204 10:56:14.168769227 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2811832Z 
2025-12-04T11:11:26.2812350Z [W1204 10:56:14.169395799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2812988Z 
2025-12-04T11:11:26.2813510Z [W1204 10:56:14.169587425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2814143Z 
2025-12-04T11:11:26.2814642Z [W1204 10:56:14.175566612 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2815292Z 
2025-12-04T11:11:26.2815793Z [W1204 10:56:14.176376114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2816447Z 
2025-12-04T11:11:26.2816948Z [W1204 10:56:14.176566289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2817586Z 
2025-12-04T11:11:26.2817731Z ('RERUN', {'yellow': True}) [0.4280s] [100%]
2025-12-04T11:11:26.2819233Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:14.471158691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2820596Z 
2025-12-04T11:11:26.2821101Z [W1204 10:56:14.471911341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2821751Z 
2025-12-04T11:11:26.2822252Z [W1204 10:56:14.472107232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2822900Z 
2025-12-04T11:11:26.2823403Z [W1204 10:56:14.475990930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2824051Z 
2025-12-04T11:11:26.2824555Z [W1204 10:56:14.476765413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2825193Z 
2025-12-04T11:11:26.2825710Z [W1204 10:56:14.476953296 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2826346Z 
2025-12-04T11:11:26.2826862Z [W1204 10:56:14.482869398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2827498Z 
2025-12-04T11:11:26.2828002Z [W1204 10:56:14.483491792 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2828653Z 
2025-12-04T11:11:26.2829157Z [W1204 10:56:14.483674941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2829803Z 
2025-12-04T11:11:26.2830303Z [W1204 10:56:14.569019750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2830935Z 
2025-12-04T11:11:26.2831531Z [W1204 10:56:14.569751367 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2832166Z 
2025-12-04T11:11:26.2832684Z [W1204 10:56:14.569950798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2833347Z 
2025-12-04T11:11:26.2833847Z [W1204 10:56:14.573797332 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2834493Z 
2025-12-04T11:11:26.2835024Z [W1204 10:56:14.574424349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2835670Z 
2025-12-04T11:11:26.2836173Z [W1204 10:56:14.574615260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2836804Z 
2025-12-04T11:11:26.2837321Z [W1204 10:56:14.580475550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2837957Z 
2025-12-04T11:11:26.2838471Z [W1204 10:56:14.581239295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2839107Z 
2025-12-04T11:11:26.2839606Z [W1204 10:56:14.581428241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2840250Z 
2025-12-04T11:11:26.2840349Z FAILED [0.4031s] [100%]
2025-12-04T11:11:26.2840535Z 
2025-12-04T11:11:26.2840677Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.2841518Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2842263Z Traceback (most recent call last):
2025-12-04T11:11:26.2843006Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2843880Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2844697Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2845442Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2846275Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2847145Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2847601Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2847860Z 
2025-12-04T11:11:26.2847965Z Expected 1 but got 2.
2025-12-04T11:11:26.2848250Z Absolute difference: 1
2025-12-04T11:11:26.2848538Z Relative difference: 1.0
2025-12-04T11:11:26.2848724Z 
2025-12-04T11:11:26.2848936Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2850173Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2851185Z 
2025-12-04T11:11:26.2851465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2852087Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2852550Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2853282Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2854170Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2854623Z graph_break []
2025-12-04T11:11:26.2854996Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2856615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.2858053Z   if out == self.unknown_value:
2025-12-04T11:11:26.2859012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2859974Z   warnings.warn(
2025-12-04T11:11:26.2860849Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2861832Z   warnings.warn(
2025-12-04T11:11:26.2862480Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2863245Z Traceback (most recent call last):
2025-12-04T11:11:26.2863993Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2864849Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2865657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2866410Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2867227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2868084Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2868552Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2868797Z 
2025-12-04T11:11:26.2868916Z Expected 1 but got 2.
2025-12-04T11:11:26.2869200Z Absolute difference: 1
2025-12-04T11:11:26.2869481Z Relative difference: 1.0
2025-12-04T11:11:26.2869677Z 
2025-12-04T11:11:26.2869884Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2871115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2872122Z 
2025-12-04T11:11:26.2872385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2873001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2873469Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2874204Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2875126Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2875592Z graph_break []
2025-12-04T11:11:26.2875958Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2877502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.2878926Z   if out == self.unknown_value:
2025-12-04T11:11:26.2879858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2880834Z   warnings.warn(
2025-12-04T11:11:26.2881768Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2882737Z   warnings.warn(
2025-12-04T11:11:26.2883119Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2883620Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2884162Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2885048Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2885843Z graph_break []
2025-12-04T11:11:26.2886216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2887269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2888259Z   warnings.warn(
2025-12-04T11:11:26.2889119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2890049Z   warnings.warn(
2025-12-04T11:11:26.2890359Z =================================== FAILURES ===================================
2025-12-04T11:11:26.2891185Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.2891939Z Traceback (most recent call last):
2025-12-04T11:11:26.2892694Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.2893594Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.2894409Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.2895165Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.2895975Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.2896853Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.2897358Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2897608Z 
2025-12-04T11:11:26.2897716Z Expected 1 but got 2.
2025-12-04T11:11:26.2898003Z Absolute difference: 1
2025-12-04T11:11:26.2898299Z Relative difference: 1.0
2025-12-04T11:11:26.2898487Z 
2025-12-04T11:11:26.2898710Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2899936Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2901154Z 
2025-12-04T11:11:26.2901424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2902061Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2902536Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2903267Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2904171Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2904632Z graph_break []
2025-12-04T11:11:26.2904988Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2906528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.2907969Z   if out == self.unknown_value:
2025-12-04T11:11:26.2908903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2909836Z   warnings.warn(
2025-12-04T11:11:26.2910712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2911653Z   warnings.warn(
2025-12-04T11:11:26.2912201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2912668Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2913106Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2914040Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2914779Z graph_break []
2025-12-04T11:11:26.2915152Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2916274Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2917227Z   warnings.warn(
2025-12-04T11:11:26.2918092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2919041Z   warnings.warn(
2025-12-04T11:11:26.2919417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.2919872Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.2920315Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.2921195Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.2922028Z graph_break []
2025-12-04T11:11:26.2922384Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.2923457Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2924399Z   warnings.warn(
2025-12-04T11:11:26.2925288Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.2926223Z   warnings.warn(
2025-12-04T11:11:26.2927216Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml -
2025-12-04T11:11:26.2928349Z =========================== short test summary info ============================
2025-12-04T11:11:26.2929594Z FAILED [0.4031s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.2930627Z 
2025-12-04T11:11:26.2930734Z Expected 1 but got 2.
2025-12-04T11:11:26.2931026Z Absolute difference: 1
2025-12-04T11:11:26.2931324Z Relative difference: 1.0
2025-12-04T11:11:26.2931514Z 
2025-12-04T11:11:26.2931725Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.2932965Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2933994Z 
2025-12-04T11:11:26.2934258Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.2934840Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.2935351Z ================== 1 failed, 10 deselected, 2 rerun in 20.18s ==================
2025-12-04T11:11:26.2935800Z Got exit code 1
2025-12-04T11:11:26.2936067Z Retrying single test...
2025-12-04T11:11:26.2936688Z W1204 10:56:26.002000 88425 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.2937894Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml
2025-12-04T11:11:26.2938971Z ============================= test session starts ==============================
2025-12-04T11:11:26.2939621Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.2940204Z cachedir: .pytest_cache
2025-12-04T11:11:26.2940941Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.2941714Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.2942065Z configfile: pytest.ini
2025-12-04T11:11:26.2942800Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.2943674Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.2944986Z stepcurrent: skipping 0 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.2946183Z Running 1 items in this shard
2025-12-04T11:11:26.2946387Z 
2025-12-04T11:11:26.2947631Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:29.098216967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2949022Z 
2025-12-04T11:11:26.2949529Z [W1204 10:56:44.869871539 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2950180Z 
2025-12-04T11:11:26.2950683Z [W1204 10:56:44.870155439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2951337Z 
2025-12-04T11:11:26.2951843Z [W1204 10:56:44.877374852 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2952477Z 
2025-12-04T11:11:26.2952993Z [W1204 10:56:44.878093244 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2953629Z 
2025-12-04T11:11:26.2954130Z [W1204 10:56:44.878288939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2954774Z 
2025-12-04T11:11:26.2955279Z [W1204 10:56:44.885080664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2955926Z 
2025-12-04T11:11:26.2956425Z [W1204 10:56:44.885714402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2957074Z 
2025-12-04T11:11:26.2957577Z [W1204 10:56:44.885893376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2958210Z 
2025-12-04T11:11:26.2958731Z [W1204 10:56:46.828084914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2959363Z 
2025-12-04T11:11:26.2959878Z [W1204 10:56:46.829787664 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2960513Z 
2025-12-04T11:11:26.2961009Z [W1204 10:56:46.830015007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2961730Z 
2025-12-04T11:11:26.2962229Z [W1204 10:56:46.833891649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2962872Z 
2025-12-04T11:11:26.2963377Z [W1204 10:56:46.834533565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2964009Z 
2025-12-04T11:11:26.2964620Z [W1204 10:56:46.834721693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2965257Z 
2025-12-04T11:11:26.2965774Z [W1204 10:56:46.840617180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2966443Z 
2025-12-04T11:11:26.2966941Z [W1204 10:56:46.841238598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2967621Z 
2025-12-04T11:11:26.2968120Z [W1204 10:56:46.841425360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2968767Z 
2025-12-04T11:11:26.2968898Z ('RERUN', {'yellow': True}) [18.5034s] [100%]
2025-12-04T11:11:26.2970402Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:46.197235976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2971762Z 
2025-12-04T11:11:26.2972284Z [W1204 10:56:46.197981154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2972919Z 
2025-12-04T11:11:26.2973418Z [W1204 10:56:46.198202172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2974067Z 
2025-12-04T11:11:26.2974572Z [W1204 10:56:46.202020570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2975219Z 
2025-12-04T11:11:26.2975721Z [W1204 10:56:46.202789913 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2976430Z 
2025-12-04T11:11:26.2977186Z [W1204 10:56:46.202975955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2977825Z 
2025-12-04T11:11:26.2978336Z [W1204 10:56:46.208781567 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2978972Z 
2025-12-04T11:11:26.2979471Z [W1204 10:56:46.209373872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2980123Z 
2025-12-04T11:11:26.2980622Z [W1204 10:56:46.209552665 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2981267Z 
2025-12-04T11:11:26.2981766Z [W1204 10:56:46.294293671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2982397Z 
2025-12-04T11:11:26.2982913Z [W1204 10:56:46.295037406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2983544Z 
2025-12-04T11:11:26.2984057Z [W1204 10:56:46.295237725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2984690Z 
2025-12-04T11:11:26.2985187Z [W1204 10:56:46.299026941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2985835Z 
2025-12-04T11:11:26.2986331Z [W1204 10:56:46.299638874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2986972Z 
2025-12-04T11:11:26.2987469Z [W1204 10:56:46.299827696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2988114Z 
2025-12-04T11:11:26.2988692Z [W1204 10:56:46.305687767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2989328Z 
2025-12-04T11:11:26.2989842Z [W1204 10:56:46.306462577 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2990507Z 
2025-12-04T11:11:26.2991022Z [W1204 10:56:46.306649558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2991654Z 
2025-12-04T11:11:26.2991816Z ('RERUN', {'yellow': True}) [0.4263s] [100%]
2025-12-04T11:11:26.2993316Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 [W1204 10:56:46.597767375 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2994693Z 
2025-12-04T11:11:26.2995197Z [W1204 10:56:46.598513684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2995833Z 
2025-12-04T11:11:26.2996347Z [W1204 10:56:46.598708987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2996982Z 
2025-12-04T11:11:26.2997495Z [W1204 10:56:46.602570838 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2998126Z 
2025-12-04T11:11:26.2998627Z [W1204 10:56:46.603376467 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.2999273Z 
2025-12-04T11:11:26.2999770Z [W1204 10:56:46.603561483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3000412Z 
2025-12-04T11:11:26.3001110Z [W1204 10:56:46.609423164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3001811Z 
2025-12-04T11:11:26.3002324Z [W1204 10:56:46.610075409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3002959Z 
2025-12-04T11:11:26.3003476Z [W1204 10:56:46.610276998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3004105Z 
2025-12-04T11:11:26.3004603Z [W1204 10:56:47.695954333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3005251Z 
2025-12-04T11:11:26.3005750Z [W1204 10:56:47.696694028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3006400Z 
2025-12-04T11:11:26.3006904Z [W1204 10:56:47.696909425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3007538Z 
2025-12-04T11:11:26.3008049Z [W1204 10:56:47.700722824 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3008685Z 
2025-12-04T11:11:26.3009201Z [W1204 10:56:47.701338652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3009837Z 
2025-12-04T11:11:26.3010336Z [W1204 10:56:47.701526798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3010986Z 
2025-12-04T11:11:26.3011485Z [W1204 10:56:47.707331907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3012132Z 
2025-12-04T11:11:26.3012630Z [W1204 10:56:47.708092736 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3013404Z 
2025-12-04T11:11:26.3013917Z [W1204 10:56:47.708278729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3014546Z 
2025-12-04T11:11:26.3014704Z FAILED [0.3998s] [100%]
2025-12-04T11:11:26.3014877Z 
2025-12-04T11:11:26.3015020Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3015800Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3016626Z Traceback (most recent call last):
2025-12-04T11:11:26.3017367Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3018215Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3019027Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3019783Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3020594Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3021464Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3021933Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3022182Z 
2025-12-04T11:11:26.3022303Z Expected 1 but got 2.
2025-12-04T11:11:26.3022576Z Absolute difference: 1
2025-12-04T11:11:26.3022872Z Relative difference: 1.0
2025-12-04T11:11:26.3023058Z 
2025-12-04T11:11:26.3023283Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3024507Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3025535Z 
2025-12-04T11:11:26.3025802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3026424Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3026901Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3027630Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3028512Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3028984Z graph_break []
2025-12-04T11:11:26.3029360Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3030891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3032331Z   if out == self.unknown_value:
2025-12-04T11:11:26.3033276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3034230Z   warnings.warn(
2025-12-04T11:11:26.3035095Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3036045Z   warnings.warn(
2025-12-04T11:11:26.3036708Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3037461Z Traceback (most recent call last):
2025-12-04T11:11:26.3038198Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3039061Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3039965Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3040700Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3041605Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3042517Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3042983Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3043229Z 
2025-12-04T11:11:26.3043335Z Expected 1 but got 2.
2025-12-04T11:11:26.3043656Z Absolute difference: 1
2025-12-04T11:11:26.3043945Z Relative difference: 1.0
2025-12-04T11:11:26.3044131Z 
2025-12-04T11:11:26.3044338Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3045571Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3046591Z 
2025-12-04T11:11:26.3046855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3047470Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3047932Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3048668Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3049545Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3050007Z graph_break []
2025-12-04T11:11:26.3050358Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3051892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3053323Z   if out == self.unknown_value:
2025-12-04T11:11:26.3054248Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3055179Z   warnings.warn(
2025-12-04T11:11:26.3056046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3056992Z   warnings.warn(
2025-12-04T11:11:26.3057358Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3057827Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3058270Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3059150Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3059894Z graph_break []
2025-12-04T11:11:26.3060268Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3061337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3062270Z   warnings.warn(
2025-12-04T11:11:26.3063139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3064083Z   warnings.warn(
2025-12-04T11:11:26.3064387Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3065154Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3065903Z Traceback (most recent call last):
2025-12-04T11:11:26.3066723Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3067585Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3068381Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3069160Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3069982Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3070867Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3071338Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3071598Z 
2025-12-04T11:11:26.3071701Z Expected 1 but got 2.
2025-12-04T11:11:26.3071985Z Absolute difference: 1
2025-12-04T11:11:26.3072261Z Relative difference: 1.0
2025-12-04T11:11:26.3072462Z 
2025-12-04T11:11:26.3072671Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3073900Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3074907Z 
2025-12-04T11:11:26.3075182Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3075789Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3076256Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3076990Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3077856Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3078316Z graph_break []
2025-12-04T11:11:26.3078681Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3080222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3081709Z   if out == self.unknown_value:
2025-12-04T11:11:26.3082649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3083608Z   warnings.warn(
2025-12-04T11:11:26.3084483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3085412Z   warnings.warn(
2025-12-04T11:11:26.3085786Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3086259Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3086681Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3087559Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3088309Z graph_break []
2025-12-04T11:11:26.3088679Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3089728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3090679Z   warnings.warn(
2025-12-04T11:11:26.3091559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3092497Z   warnings.warn(
2025-12-04T11:11:26.3092861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3093331Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3093858Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3094730Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3095518Z graph_break []
2025-12-04T11:11:26.3095896Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3096966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3097961Z   warnings.warn(
2025-12-04T11:11:26.3098835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3099781Z   warnings.warn(
2025-12-04T11:11:26.3100758Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml -
2025-12-04T11:11:26.3102090Z =========================== short test summary info ============================
2025-12-04T11:11:26.3103325Z FAILED [0.3998s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3104360Z 
2025-12-04T11:11:26.3104481Z Expected 1 but got 2.
2025-12-04T11:11:26.3104760Z Absolute difference: 1
2025-12-04T11:11:26.3105059Z Relative difference: 1.0
2025-12-04T11:11:26.3105263Z 
2025-12-04T11:11:26.3105477Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3106712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3107719Z 
2025-12-04T11:11:26.3107988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3108573Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3109087Z ================== 1 failed, 10 deselected, 2 rerun in 19.36s ==================
2025-12-04T11:11:26.3109522Z Got exit code 1
2025-12-04T11:11:26.3110463Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3111800Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.3112780Z W1204 10:56:58.174000 88599 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3113989Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml
2025-12-04T11:11:26.3114920Z ============================= test session starts ==============================
2025-12-04T11:11:26.3115569Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3116161Z cachedir: .pytest_cache
2025-12-04T11:11:26.3116845Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3117616Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3117963Z configfile: pytest.ini
2025-12-04T11:11:26.3118678Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3119536Z collecting ... collected 58 items / 1 deselected / 57 selected
2025-12-04T11:11:26.3120020Z stepcurrent: skipping 1 already run items.
2025-12-04T11:11:26.3120400Z Running 10 items in this shard
2025-12-04T11:11:26.3120601Z 
2025-12-04T11:11:26.3121678Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7192s] [ 10%]
2025-12-04T11:11:26.3123485Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4016s] [ 10%]
2025-12-04T11:11:26.3125283Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 FAILED [0.4080s] [ 10%]
2025-12-04T11:11:26.3126242Z 
2025-12-04T11:11:26.3126382Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3127162Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3127902Z Traceback (most recent call last):
2025-12-04T11:11:26.3128639Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3129503Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3130322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3131060Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3131887Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3132755Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3133210Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3133471Z 
2025-12-04T11:11:26.3133576Z Expected 1 but got 2.
2025-12-04T11:11:26.3133864Z Absolute difference: 1
2025-12-04T11:11:26.3134153Z Relative difference: 1.0
2025-12-04T11:11:26.3134342Z 
2025-12-04T11:11:26.3134553Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3135785Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3136807Z 
2025-12-04T11:11:26.3137066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3137684Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3138141Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3138872Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3139750Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3140191Z graph_break []
2025-12-04T11:11:26.3140564Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3141644Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3142595Z   warnings.warn(
2025-12-04T11:11:26.3143459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3144404Z   warnings.warn(
2025-12-04T11:11:26.3145067Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3145824Z Traceback (most recent call last):
2025-12-04T11:11:26.3146548Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3147411Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3148284Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3149022Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3149842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3150744Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3151213Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3151457Z 
2025-12-04T11:11:26.3151595Z Expected 1 but got 2.
2025-12-04T11:11:26.3151879Z Absolute difference: 1
2025-12-04T11:11:26.3152170Z Relative difference: 1.0
2025-12-04T11:11:26.3152356Z 
2025-12-04T11:11:26.3152567Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3153802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3154829Z 
2025-12-04T11:11:26.3155088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3155706Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3156163Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3156889Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3157760Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3158220Z graph_break []
2025-12-04T11:11:26.3158572Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3159639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3160590Z   warnings.warn(
2025-12-04T11:11:26.3161535Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3162474Z   warnings.warn(
2025-12-04T11:11:26.3162851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3163326Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3163754Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3164634Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3165391Z graph_break []
2025-12-04T11:11:26.3165757Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3166817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3167767Z   warnings.warn(
2025-12-04T11:11:26.3168641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3201132Z   warnings.warn(
2025-12-04T11:11:26.3201672Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3202544Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3203339Z Traceback (most recent call last):
2025-12-04T11:11:26.3204077Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3204945Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3205763Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3206713Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3207543Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3208412Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3208943Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3209191Z 
2025-12-04T11:11:26.3209301Z Expected 1 but got 2.
2025-12-04T11:11:26.3209589Z Absolute difference: 1
2025-12-04T11:11:26.3209884Z Relative difference: 1.0
2025-12-04T11:11:26.3210123Z 
2025-12-04T11:11:26.3210328Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3211543Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3212554Z 
2025-12-04T11:11:26.3212812Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3213412Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3213857Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3214572Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3215430Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3215862Z graph_break []
2025-12-04T11:11:26.3216215Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3217268Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3218217Z   warnings.warn(
2025-12-04T11:11:26.3219069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3219989Z   warnings.warn(
2025-12-04T11:11:26.3220354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3220801Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3221212Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3222071Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3222811Z graph_break []
2025-12-04T11:11:26.3223154Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3224205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3225133Z   warnings.warn(
2025-12-04T11:11:26.3225989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3226919Z   warnings.warn(
2025-12-04T11:11:26.3227277Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3227729Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3228146Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3229002Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3229742Z graph_break []
2025-12-04T11:11:26.3230096Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3231139Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3232063Z   warnings.warn(
2025-12-04T11:11:26.3232984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3233915Z   warnings.warn(
2025-12-04T11:11:26.3234957Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml -
2025-12-04T11:11:26.3236053Z =========================== short test summary info ============================
2025-12-04T11:11:26.3237313Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3238348Z 
2025-12-04T11:11:26.3238460Z Expected 1 but got 2.
2025-12-04T11:11:26.3238727Z Absolute difference: 1
2025-12-04T11:11:26.3239012Z Relative difference: 1.0
2025-12-04T11:11:26.3239199Z 
2025-12-04T11:11:26.3239430Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3240635Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3241724Z 
2025-12-04T11:11:26.3241976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3242538Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3243026Z =================== 1 failed, 1 deselected, 2 rerun in 4.56s ===================
2025-12-04T11:11:26.3243428Z Got exit code 1
2025-12-04T11:11:26.3243681Z Retrying single test...
2025-12-04T11:11:26.3244306Z W1204 10:57:17.701000 88768 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3245531Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml
2025-12-04T11:11:26.3246469Z ============================= test session starts ==============================
2025-12-04T11:11:26.3247120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3247711Z cachedir: .pytest_cache
2025-12-04T11:11:26.3248396Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3249157Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3249481Z configfile: pytest.ini
2025-12-04T11:11:26.3250185Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3251036Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3252327Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3253513Z Running 1 items in this shard
2025-12-04T11:11:26.3253709Z 
2025-12-04T11:11:26.3254953Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:21.781846996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3256331Z 
2025-12-04T11:11:26.3256844Z [W1204 10:57:36.335603287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3257473Z 
2025-12-04T11:11:26.3257976Z [W1204 10:57:36.335851769 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3258610Z 
2025-12-04T11:11:26.3259177Z [W1204 10:57:36.343104974 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3259815Z 
2025-12-04T11:11:26.3260307Z [W1204 10:57:36.343850134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3260973Z 
2025-12-04T11:11:26.3261479Z [W1204 10:57:36.344034881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3262139Z 
2025-12-04T11:11:26.3262644Z [W1204 10:57:36.350770590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3263273Z 
2025-12-04T11:11:26.3263770Z [W1204 10:57:36.351418961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3264408Z 
2025-12-04T11:11:26.3264911Z [W1204 10:57:36.351598410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3265547Z 
2025-12-04T11:11:26.3266036Z [W1204 10:57:38.293525572 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3266668Z 
2025-12-04T11:11:26.3267175Z [W1204 10:57:38.295232563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3267806Z 
2025-12-04T11:11:26.3268316Z [W1204 10:57:38.295432008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3268947Z 
2025-12-04T11:11:26.3269446Z [W1204 10:57:38.299216828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3270088Z 
2025-12-04T11:11:26.3270583Z [W1204 10:57:38.299829292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3271218Z 
2025-12-04T11:11:26.3271713Z [W1204 10:57:38.300046668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3272340Z 
2025-12-04T11:11:26.3272848Z [W1204 10:57:38.305861917 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3273483Z 
2025-12-04T11:11:26.3273981Z [W1204 10:57:38.306469701 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3274606Z 
2025-12-04T11:11:26.3275105Z [W1204 10:57:38.306654886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3275748Z 
2025-12-04T11:11:26.3275870Z ('RERUN', {'yellow': True}) [19.2777s] [100%]
2025-12-04T11:11:26.3277365Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:39.659886477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3278731Z 
2025-12-04T11:11:26.3279237Z [W1204 10:57:39.660682635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3279863Z 
2025-12-04T11:11:26.3280435Z [W1204 10:57:39.660887393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3281063Z 
2025-12-04T11:11:26.3281624Z [W1204 10:57:39.664757815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3282263Z 
2025-12-04T11:11:26.3282835Z [W1204 10:57:39.665543509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3283473Z 
2025-12-04T11:11:26.3283975Z [W1204 10:57:39.665728223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3284666Z 
2025-12-04T11:11:26.3285165Z [W1204 10:57:39.671637399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3285790Z 
2025-12-04T11:11:26.3286296Z [W1204 10:57:39.672251490 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3286956Z 
2025-12-04T11:11:26.3287461Z [W1204 10:57:39.672430928 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3288107Z 
2025-12-04T11:11:26.3288608Z [W1204 10:57:39.757117197 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3289257Z 
2025-12-04T11:11:26.3289756Z [W1204 10:57:39.757834976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3290404Z 
2025-12-04T11:11:26.3290907Z [W1204 10:57:39.758031035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3291534Z 
2025-12-04T11:11:26.3292049Z [W1204 10:57:39.761811743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3292682Z 
2025-12-04T11:11:26.3293191Z [W1204 10:57:39.762433214 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3293820Z 
2025-12-04T11:11:26.3294316Z [W1204 10:57:39.762619013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3294964Z 
2025-12-04T11:11:26.3295466Z [W1204 10:57:39.768391914 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3296114Z 
2025-12-04T11:11:26.3296612Z [W1204 10:57:39.769143292 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3297250Z 
2025-12-04T11:11:26.3297759Z [W1204 10:57:39.769330098 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3298393Z 
2025-12-04T11:11:26.3298533Z ('RERUN', {'yellow': True}) [0.4232s] [100%]
2025-12-04T11:11:26.3300020Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:39.058944203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3301599Z 
2025-12-04T11:11:26.3302112Z [W1204 10:57:39.059688112 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3302770Z 
2025-12-04T11:11:26.3303271Z [W1204 10:57:39.059883102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3303907Z 
2025-12-04T11:11:26.3304425Z [W1204 10:57:39.063704186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3305056Z 
2025-12-04T11:11:26.3305571Z [W1204 10:57:39.064482534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3306202Z 
2025-12-04T11:11:26.3306702Z [W1204 10:57:39.064667513 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3307348Z 
2025-12-04T11:11:26.3307968Z [W1204 10:57:39.070634801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3308615Z 
2025-12-04T11:11:26.3309111Z [W1204 10:57:39.071281120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3309789Z 
2025-12-04T11:11:26.3310301Z [W1204 10:57:39.071464672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3310971Z 
2025-12-04T11:11:26.3311482Z [W1204 10:57:39.158411652 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3312112Z 
2025-12-04T11:11:26.3312609Z [W1204 10:57:39.159150971 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3313259Z 
2025-12-04T11:11:26.3313759Z [W1204 10:57:39.159351301 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3314404Z 
2025-12-04T11:11:26.3314902Z [W1204 10:57:39.163172391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3315535Z 
2025-12-04T11:11:26.3316049Z [W1204 10:57:39.163785695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3316687Z 
2025-12-04T11:11:26.3317205Z [W1204 10:57:39.163971763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3317841Z 
2025-12-04T11:11:26.3318344Z [W1204 10:57:39.169774134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3318992Z 
2025-12-04T11:11:26.3319496Z [W1204 10:57:39.170565570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3320143Z 
2025-12-04T11:11:26.3320643Z [W1204 10:57:39.170758653 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3321290Z 
2025-12-04T11:11:26.3321388Z FAILED [0.3998s] [100%]
2025-12-04T11:11:26.3321622Z 
2025-12-04T11:11:26.3321781Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3322551Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3323302Z Traceback (most recent call last):
2025-12-04T11:11:26.3324044Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3324905Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3325714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3326468Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3327288Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3328143Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3328606Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3328863Z 
2025-12-04T11:11:26.3328972Z Expected 1 but got 2.
2025-12-04T11:11:26.3329257Z Absolute difference: 1
2025-12-04T11:11:26.3329530Z Relative difference: 1.0
2025-12-04T11:11:26.3329726Z 
2025-12-04T11:11:26.3329938Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3331247Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3332263Z 
2025-12-04T11:11:26.3332538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3333150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3333651Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3334385Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3335253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3335737Z graph_break []
2025-12-04T11:11:26.3336103Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3337650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3339066Z   if out == self.unknown_value:
2025-12-04T11:11:26.3339993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3340944Z   warnings.warn(
2025-12-04T11:11:26.3341815Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3342749Z   warnings.warn(
2025-12-04T11:11:26.3343407Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3344159Z Traceback (most recent call last):
2025-12-04T11:11:26.3344884Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3345749Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3346562Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3347307Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3348113Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3348973Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3349434Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3349679Z 
2025-12-04T11:11:26.3349795Z Expected 1 but got 2.
2025-12-04T11:11:26.3350066Z Absolute difference: 1
2025-12-04T11:11:26.3350356Z Relative difference: 1.0
2025-12-04T11:11:26.3350540Z 
2025-12-04T11:11:26.3350760Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3351978Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3353002Z 
2025-12-04T11:11:26.3353266Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3353883Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3354354Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3355074Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3355956Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3356413Z graph_break []
2025-12-04T11:11:26.3356767Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3358364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3359788Z   if out == self.unknown_value:
2025-12-04T11:11:26.3360725Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3361762Z   warnings.warn(
2025-12-04T11:11:26.3362632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3363614Z   warnings.warn(
2025-12-04T11:11:26.3363986Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3364466Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3364884Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3365765Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3366515Z graph_break []
2025-12-04T11:11:26.3366865Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3367929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3368871Z   warnings.warn(
2025-12-04T11:11:26.3369740Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3370670Z   warnings.warn(
2025-12-04T11:11:26.3370974Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3371759Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3372493Z Traceback (most recent call last):
2025-12-04T11:11:26.3373227Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3374092Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3374903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3375641Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3376457Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3377316Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3377772Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3378012Z 
2025-12-04T11:11:26.3378114Z Expected 1 but got 2.
2025-12-04T11:11:26.3378392Z Absolute difference: 1
2025-12-04T11:11:26.3378679Z Relative difference: 1.0
2025-12-04T11:11:26.3378866Z 
2025-12-04T11:11:26.3379075Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3380294Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3381311Z 
2025-12-04T11:11:26.3381569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3382180Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3382642Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3383373Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3384238Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3384696Z graph_break []
2025-12-04T11:11:26.3385047Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3386677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3388137Z   if out == self.unknown_value:
2025-12-04T11:11:26.3389069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3390034Z   warnings.warn(
2025-12-04T11:11:26.3390900Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3391835Z   warnings.warn(
2025-12-04T11:11:26.3392191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3392653Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3393086Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3393954Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3394689Z graph_break []
2025-12-04T11:11:26.3395052Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3396114Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3397044Z   warnings.warn(
2025-12-04T11:11:26.3397903Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3398847Z   warnings.warn(
2025-12-04T11:11:26.3399220Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3399678Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3400111Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3401196Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3402010Z graph_break []
2025-12-04T11:11:26.3402379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3403452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3404406Z   warnings.warn(
2025-12-04T11:11:26.3405261Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3406197Z   warnings.warn(
2025-12-04T11:11:26.3407181Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml -
2025-12-04T11:11:26.3408305Z =========================== short test summary info ============================
2025-12-04T11:11:26.3409513Z FAILED [0.3998s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3410554Z 
2025-12-04T11:11:26.3410656Z Expected 1 but got 2.
2025-12-04T11:11:26.3410940Z Absolute difference: 1
2025-12-04T11:11:26.3411230Z Relative difference: 1.0
2025-12-04T11:11:26.3411416Z 
2025-12-04T11:11:26.3411622Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3413021Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3414033Z 
2025-12-04T11:11:26.3414307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3414890Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3415436Z ================== 1 failed, 10 deselected, 2 rerun in 20.13s ==================
2025-12-04T11:11:26.3415873Z Got exit code 1
2025-12-04T11:11:26.3416132Z Retrying single test...
2025-12-04T11:11:26.3416734Z W1204 10:57:50.675000 88942 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3417991Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml
2025-12-04T11:11:26.3418933Z ============================= test session starts ==============================
2025-12-04T11:11:26.3419590Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3420168Z cachedir: .pytest_cache
2025-12-04T11:11:26.3420855Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3421622Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3421951Z configfile: pytest.ini
2025-12-04T11:11:26.3422658Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3423538Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3424845Z stepcurrent: skipping 1 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3426029Z Running 1 items in this shard
2025-12-04T11:11:26.3426244Z 
2025-12-04T11:11:26.3427490Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:57:54.760978103 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3428877Z 
2025-12-04T11:11:26.3429384Z [W1204 10:58:09.832971750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3430031Z 
2025-12-04T11:11:26.3430533Z [W1204 10:58:09.833218583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3431172Z 
2025-12-04T11:11:26.3431687Z [W1204 10:58:09.840333771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3432325Z 
2025-12-04T11:11:26.3432842Z [W1204 10:58:09.841000977 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3433477Z 
2025-12-04T11:11:26.3433973Z [W1204 10:58:09.841182409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3434618Z 
2025-12-04T11:11:26.3435118Z [W1204 10:58:09.847813932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3435763Z 
2025-12-04T11:11:26.3436263Z [W1204 10:58:09.848412743 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3436893Z 
2025-12-04T11:11:26.3437405Z [W1204 10:58:09.848591222 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3438035Z 
2025-12-04T11:11:26.3438615Z [W1204 10:58:11.787489885 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3439247Z 
2025-12-04T11:11:26.3439745Z [W1204 10:58:11.789188000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3440417Z 
2025-12-04T11:11:26.3440914Z [W1204 10:58:11.789390463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3441622Z 
2025-12-04T11:11:26.3442127Z [W1204 10:58:11.793245084 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3442801Z 
2025-12-04T11:11:26.3443316Z [W1204 10:58:11.793877334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3443950Z 
2025-12-04T11:11:26.3444465Z [W1204 10:58:11.794067164 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3445101Z 
2025-12-04T11:11:26.3445600Z [W1204 10:58:11.799941951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3446245Z 
2025-12-04T11:11:26.3446752Z [W1204 10:58:11.800589831 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3447400Z 
2025-12-04T11:11:26.3447901Z [W1204 10:58:11.800781886 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3448541Z 
2025-12-04T11:11:26.3448688Z ('RERUN', {'yellow': True}) [18.7960s] [100%]
2025-12-04T11:11:26.3450188Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:58:11.157128943 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3451558Z 
2025-12-04T11:11:26.3452065Z [W1204 10:58:11.157852060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3452717Z 
2025-12-04T11:11:26.3453226Z [W1204 10:58:11.158043979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3453880Z 
2025-12-04T11:11:26.3454380Z [W1204 10:58:11.161864609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3455014Z 
2025-12-04T11:11:26.3455533Z [W1204 10:58:11.162625941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3456162Z 
2025-12-04T11:11:26.3456679Z [W1204 10:58:11.162813994 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3457310Z 
2025-12-04T11:11:26.3457813Z [W1204 10:58:11.168628944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3458459Z 
2025-12-04T11:11:26.3458963Z [W1204 10:58:11.169218181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3459612Z 
2025-12-04T11:11:26.3460116Z [W1204 10:58:11.169398528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3460764Z 
2025-12-04T11:11:26.3461267Z [W1204 10:58:11.252434379 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3461900Z 
2025-12-04T11:11:26.3462411Z [W1204 10:58:11.253144550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3463042Z 
2025-12-04T11:11:26.3463626Z [W1204 10:58:11.253341009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3464261Z 
2025-12-04T11:11:26.3464759Z [W1204 10:58:11.257092232 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3465437Z 
2025-12-04T11:11:26.3465936Z [W1204 10:58:11.257696857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3466603Z 
2025-12-04T11:11:26.3467104Z [W1204 10:58:11.257886961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3467734Z 
2025-12-04T11:11:26.3468246Z [W1204 10:58:11.263746274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3468883Z 
2025-12-04T11:11:26.3469398Z [W1204 10:58:11.264530455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3470031Z 
2025-12-04T11:11:26.3470530Z [W1204 10:58:11.264719359 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3471176Z 
2025-12-04T11:11:26.3471302Z ('RERUN', {'yellow': True}) [0.4258s] [100%]
2025-12-04T11:11:26.3472793Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 [W1204 10:58:11.559615951 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3474154Z 
2025-12-04T11:11:26.3474672Z [W1204 10:58:11.560383229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3475305Z 
2025-12-04T11:11:26.3475821Z [W1204 10:58:11.560583110 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3476451Z 
2025-12-04T11:11:26.3476950Z [W1204 10:58:11.564350986 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3477595Z 
2025-12-04T11:11:26.3478096Z [W1204 10:58:11.565090353 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3478746Z 
2025-12-04T11:11:26.3479246Z [W1204 10:58:11.565275565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3479880Z 
2025-12-04T11:11:26.3480395Z [W1204 10:58:11.571094444 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3481028Z 
2025-12-04T11:11:26.3481608Z [W1204 10:58:11.571679492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3482240Z 
2025-12-04T11:11:26.3482742Z [W1204 10:58:11.571859442 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3483391Z 
2025-12-04T11:11:26.3483892Z [W1204 10:58:12.655516158 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3484535Z 
2025-12-04T11:11:26.3485036Z [W1204 10:58:12.656232026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3485671Z 
2025-12-04T11:11:26.3486184Z [W1204 10:58:12.656425076 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3486815Z 
2025-12-04T11:11:26.3487434Z [W1204 10:58:12.660194340 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3488068Z 
2025-12-04T11:11:26.3488568Z [W1204 10:58:12.660801764 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3489251Z 
2025-12-04T11:11:26.3489752Z [W1204 10:58:12.660989668 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3490397Z 
2025-12-04T11:11:26.3490901Z [W1204 10:58:12.666791750 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3491584Z 
2025-12-04T11:11:26.3492083Z [W1204 10:58:12.667550030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3492716Z 
2025-12-04T11:11:26.3493233Z [W1204 10:58:12.667737671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3493865Z 
2025-12-04T11:11:26.3493964Z FAILED [0.4002s] [100%]
2025-12-04T11:11:26.3494154Z 
2025-12-04T11:11:26.3494297Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3495080Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3495833Z Traceback (most recent call last):
2025-12-04T11:11:26.3496556Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3497414Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3498228Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3498975Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3499780Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3500645Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3501352Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3501602Z 
2025-12-04T11:11:26.3501706Z Expected 1 but got 2.
2025-12-04T11:11:26.3501997Z Absolute difference: 1
2025-12-04T11:11:26.3502293Z Relative difference: 1.0
2025-12-04T11:11:26.3502478Z 
2025-12-04T11:11:26.3502701Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3503922Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3504947Z 
2025-12-04T11:11:26.3505209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3505829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3506299Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3507018Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3507893Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3508351Z graph_break []
2025-12-04T11:11:26.3508709Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3510236Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3511659Z   if out == self.unknown_value:
2025-12-04T11:11:26.3512582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3513691Z   warnings.warn(
2025-12-04T11:11:26.3514562Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3515558Z   warnings.warn(
2025-12-04T11:11:26.3516211Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3516948Z Traceback (most recent call last):
2025-12-04T11:11:26.3517730Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3518591Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3519399Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3520137Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3520959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3521901Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3522362Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3522619Z 
2025-12-04T11:11:26.3522723Z Expected 1 but got 2.
2025-12-04T11:11:26.3523008Z Absolute difference: 1
2025-12-04T11:11:26.3523286Z Relative difference: 1.0
2025-12-04T11:11:26.3523486Z 
2025-12-04T11:11:26.3523700Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3524925Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3525935Z 
2025-12-04T11:11:26.3526207Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3526830Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3527286Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3528015Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3528905Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3529352Z graph_break []
2025-12-04T11:11:26.3529722Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3531271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3532703Z   if out == self.unknown_value:
2025-12-04T11:11:26.3533625Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3534575Z   warnings.warn(
2025-12-04T11:11:26.3535448Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3536399Z   warnings.warn(
2025-12-04T11:11:26.3536760Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3537236Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3537676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3538537Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3539451Z graph_break []
2025-12-04T11:11:26.3539833Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3541090Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3542123Z   warnings.warn(
2025-12-04T11:11:26.3542998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3543990Z   warnings.warn(
2025-12-04T11:11:26.3544284Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3545105Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.3545856Z Traceback (most recent call last):
2025-12-04T11:11:26.3546587Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3547436Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3548248Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3548988Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3549805Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3550664Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3551135Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3551379Z 
2025-12-04T11:11:26.3551496Z Expected 1 but got 2.
2025-12-04T11:11:26.3551768Z Absolute difference: 1
2025-12-04T11:11:26.3552057Z Relative difference: 1.0
2025-12-04T11:11:26.3552242Z 
2025-12-04T11:11:26.3552465Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3553696Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3554713Z 
2025-12-04T11:11:26.3554974Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3555596Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3556076Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3556814Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3557681Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3558149Z graph_break []
2025-12-04T11:11:26.3558511Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3560041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3561550Z   if out == self.unknown_value:
2025-12-04T11:11:26.3562483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3563444Z   warnings.warn(
2025-12-04T11:11:26.3564352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3565302Z   warnings.warn(
2025-12-04T11:11:26.3565680Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3566151Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3566575Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3567454Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3568315Z graph_break []
2025-12-04T11:11:26.3568673Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3569402Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3569535Z   warnings.warn(
2025-12-04T11:11:26.3570265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3570423Z   warnings.warn(
2025-12-04T11:11:26.3570639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3570768Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3570996Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3571519Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3571630Z graph_break []
2025-12-04T11:11:26.3571845Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3572580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3572678Z   warnings.warn(
2025-12-04T11:11:26.3573392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3573513Z   warnings.warn(
2025-12-04T11:11:26.3574341Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml -
2025-12-04T11:11:26.3574525Z =========================== short test summary info ============================
2025-12-04T11:11:26.3575448Z FAILED [0.4002s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3575456Z 
2025-12-04T11:11:26.3575564Z Expected 1 but got 2.
2025-12-04T11:11:26.3575682Z Absolute difference: 1
2025-12-04T11:11:26.3575795Z Relative difference: 1.0
2025-12-04T11:11:26.3575800Z 
2025-12-04T11:11:26.3576028Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3576912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3576917Z 
2025-12-04T11:11:26.3577181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3577377Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3577571Z ================== 1 failed, 10 deselected, 2 rerun in 19.65s ==================
2025-12-04T11:11:26.3577680Z Got exit code 1
2025-12-04T11:11:26.3578479Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.3578883Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.3579336Z W1204 10:58:23.242000 89116 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3579983Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml
2025-12-04T11:11:26.3580161Z ============================= test session starts ==============================
2025-12-04T11:11:26.3580572Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3580683Z cachedir: .pytest_cache
2025-12-04T11:11:26.3581203Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3581351Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3581455Z configfile: pytest.ini
2025-12-04T11:11:26.3582002Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3582238Z collecting ... collected 58 items / 2 deselected / 56 selected
2025-12-04T11:11:26.3582388Z stepcurrent: skipping 2 already run items.
2025-12-04T11:11:26.3582498Z Running 9 items in this shard
2025-12-04T11:11:26.3582503Z 
2025-12-04T11:11:26.3583360Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7787s] [ 11%]
2025-12-04T11:11:26.3584215Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4661s] [ 11%]
2025-12-04T11:11:26.3584975Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4632s] [ 11%]
2025-12-04T11:11:26.3584981Z 
2025-12-04T11:11:26.3585131Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3585624Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3585744Z Traceback (most recent call last):
2025-12-04T11:11:26.3586258Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3586491Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3586959Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3587118Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3587647Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3587862Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3587992Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3587998Z 
2025-12-04T11:11:26.3588124Z Expected 1 but got 2.
2025-12-04T11:11:26.3588231Z Absolute difference: 1
2025-12-04T11:11:26.3588340Z Relative difference: 1.0
2025-12-04T11:11:26.3588345Z 
2025-12-04T11:11:26.3588571Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3589459Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3589464Z 
2025-12-04T11:11:26.3589744Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3589964Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3590078Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3590611Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3590834Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3590930Z graph_break []
2025-12-04T11:11:26.3591156Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3591933Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3592046Z   warnings.warn(
2025-12-04T11:11:26.3592753Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3592885Z   warnings.warn(
2025-12-04T11:11:26.3593395Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3593512Z Traceback (most recent call last):
2025-12-04T11:11:26.3594040Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3594279Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3594728Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3594898Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3595426Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3595628Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3595774Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3595779Z 
2025-12-04T11:11:26.3595881Z Expected 1 but got 2.
2025-12-04T11:11:26.3596000Z Absolute difference: 1
2025-12-04T11:11:26.3596108Z Relative difference: 1.0
2025-12-04T11:11:26.3596112Z 
2025-12-04T11:11:26.3596326Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3597229Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3597234Z 
2025-12-04T11:11:26.3597499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3597732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3597847Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3598368Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3598608Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3598705Z graph_break []
2025-12-04T11:11:26.3598917Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3599651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3599749Z   warnings.warn(
2025-12-04T11:11:26.3600468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3600570Z   warnings.warn(
2025-12-04T11:11:26.3600784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3601118Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3601340Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3601922Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3602038Z graph_break []
2025-12-04T11:11:26.3602253Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3602977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3603077Z   warnings.warn(
2025-12-04T11:11:26.3603910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3604025Z   warnings.warn(
2025-12-04T11:11:26.3604168Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3604746Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3604867Z Traceback (most recent call last):
2025-12-04T11:11:26.3605365Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3605645Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3606093Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3606254Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3606801Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3607002Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3607147Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3607153Z 
2025-12-04T11:11:26.3607257Z Expected 1 but got 2.
2025-12-04T11:11:26.3607361Z Absolute difference: 1
2025-12-04T11:11:26.3607483Z Relative difference: 1.0
2025-12-04T11:11:26.3607488Z 
2025-12-04T11:11:26.3607700Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3608579Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3608600Z 
2025-12-04T11:11:26.3608864Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3609082Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3609216Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3609737Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3609964Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3610076Z graph_break []
2025-12-04T11:11:26.3610287Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3611019Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3611118Z   warnings.warn(
2025-12-04T11:11:26.3611828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3611937Z   warnings.warn(
2025-12-04T11:11:26.3612152Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3612264Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3612494Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3613010Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3613119Z graph_break []
2025-12-04T11:11:26.3613326Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3614038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3614147Z   warnings.warn(
2025-12-04T11:11:26.3614850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3614961Z   warnings.warn(
2025-12-04T11:11:26.3615234Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3615346Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3615580Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3616143Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3616237Z graph_break []
2025-12-04T11:11:26.3616461Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3617199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3617309Z   warnings.warn(
2025-12-04T11:11:26.3618012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3618115Z   warnings.warn(
2025-12-04T11:11:26.3618952Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml -
2025-12-04T11:11:26.3619125Z =========================== short test summary info ============================
2025-12-04T11:11:26.3620053Z FAILED [0.4632s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3620061Z 
2025-12-04T11:11:26.3620167Z Expected 1 but got 2.
2025-12-04T11:11:26.3620271Z Absolute difference: 1
2025-12-04T11:11:26.3620388Z Relative difference: 1.0
2025-12-04T11:11:26.3620393Z 
2025-12-04T11:11:26.3620608Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3621507Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3621512Z 
2025-12-04T11:11:26.3621773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3621951Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3622153Z =================== 1 failed, 2 deselected, 2 rerun in 4.74s ===================
2025-12-04T11:11:26.3622249Z Got exit code 1
2025-12-04T11:11:26.3622356Z Retrying single test...
2025-12-04T11:11:26.3622802Z W1204 10:58:42.773000 89292 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3623444Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml
2025-12-04T11:11:26.3623616Z ============================= test session starts ==============================
2025-12-04T11:11:26.3623960Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3624068Z cachedir: .pytest_cache
2025-12-04T11:11:26.3624589Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3624710Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3624829Z configfile: pytest.ini
2025-12-04T11:11:26.3625357Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3625573Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3626546Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3626727Z Running 1 items in this shard
2025-12-04T11:11:26.3626733Z 
2025-12-04T11:11:26.3627997Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:58:46.934870676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3628035Z 
2025-12-04T11:11:26.3628545Z [W1204 10:59:01.057249516 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3628583Z 
2025-12-04T11:11:26.3629099Z [W1204 10:59:01.057500915 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3629105Z 
2025-12-04T11:11:26.3629603Z [W1204 10:59:01.064606512 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3629608Z 
2025-12-04T11:11:26.3630111Z [W1204 10:59:01.065260370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3630129Z 
2025-12-04T11:11:26.3630624Z [W1204 10:59:01.065443341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3630631Z 
2025-12-04T11:11:26.3631126Z [W1204 10:59:01.072107141 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3631133Z 
2025-12-04T11:11:26.3631648Z [W1204 10:59:01.072730775 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3631653Z 
2025-12-04T11:11:26.3632147Z [W1204 10:59:01.072912663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3632152Z 
2025-12-04T11:11:26.3632665Z [W1204 10:59:03.015186102 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3632670Z 
2025-12-04T11:11:26.3633168Z [W1204 10:59:03.017137211 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3633175Z 
2025-12-04T11:11:26.3633685Z [W1204 10:59:03.017338573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3633691Z 
2025-12-04T11:11:26.3634187Z [W1204 10:59:03.021211821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3634192Z 
2025-12-04T11:11:26.3634685Z [W1204 10:59:03.021840383 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3634702Z 
2025-12-04T11:11:26.3635202Z [W1204 10:59:03.022030889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3635207Z 
2025-12-04T11:11:26.3635705Z [W1204 10:59:03.027927001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3635712Z 
2025-12-04T11:11:26.3636220Z [W1204 10:59:03.028534605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3636224Z 
2025-12-04T11:11:26.3636723Z [W1204 10:59:03.028721918 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3636727Z 
2025-12-04T11:11:26.3636868Z ('RERUN', {'yellow': True}) [18.9130s] [100%]
2025-12-04T11:11:26.3638176Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:03.445813069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3638182Z 
2025-12-04T11:11:26.3638697Z [W1204 10:59:03.446587586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3638732Z 
2025-12-04T11:11:26.3639232Z [W1204 10:59:03.446786996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3639236Z 
2025-12-04T11:11:26.3639777Z [W1204 10:59:03.450665381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3639782Z 
2025-12-04T11:11:26.3640277Z [W1204 10:59:03.451466801 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3640282Z 
2025-12-04T11:11:26.3640783Z [W1204 10:59:03.451656784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3640800Z 
2025-12-04T11:11:26.3641295Z [W1204 10:59:03.457536331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3641302Z 
2025-12-04T11:11:26.3641894Z [W1204 10:59:03.458156152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3641900Z 
2025-12-04T11:11:26.3642414Z [W1204 10:59:03.458338979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3642422Z 
2025-12-04T11:11:26.3642917Z [W1204 10:59:03.542263457 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3642922Z 
2025-12-04T11:11:26.3643436Z [W1204 10:59:03.543020452 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3643442Z 
2025-12-04T11:11:26.3643940Z [W1204 10:59:03.543224932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3643947Z 
2025-12-04T11:11:26.3644460Z [W1204 10:59:03.547039609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3644464Z 
2025-12-04T11:11:26.3644962Z [W1204 10:59:03.547649020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3644969Z 
2025-12-04T11:11:26.3645480Z [W1204 10:59:03.547837223 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3645484Z 
2025-12-04T11:11:26.3645981Z [W1204 10:59:03.553760295 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3645989Z 
2025-12-04T11:11:26.3646486Z [W1204 10:59:03.554557388 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3646507Z 
2025-12-04T11:11:26.3647002Z [W1204 10:59:03.554745043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3647007Z 
2025-12-04T11:11:26.3647131Z ('RERUN', {'yellow': True}) [0.4879s] [100%]
2025-12-04T11:11:26.3648391Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:04.914874047 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3648397Z 
2025-12-04T11:11:26.3648963Z [W1204 10:59:04.915594534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3648968Z 
2025-12-04T11:11:26.3649481Z [W1204 10:59:04.915786930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3649515Z 
2025-12-04T11:11:26.3650008Z [W1204 10:59:04.919600185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3650013Z 
2025-12-04T11:11:26.3650518Z [W1204 10:59:04.920379329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3650551Z 
2025-12-04T11:11:26.3651048Z [W1204 10:59:04.920570726 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3651053Z 
2025-12-04T11:11:26.3651548Z [W1204 10:59:04.926408373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3651571Z 
2025-12-04T11:11:26.3652065Z [W1204 10:59:04.927005548 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3652069Z 
2025-12-04T11:11:26.3652570Z [W1204 10:59:04.927187819 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3652575Z 
2025-12-04T11:11:26.3653081Z [W1204 10:59:04.012543738 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3653088Z 
2025-12-04T11:11:26.3653582Z [W1204 10:59:04.013309458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3653587Z 
2025-12-04T11:11:26.3654096Z [W1204 10:59:04.013518408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3654101Z 
2025-12-04T11:11:26.3654599Z [W1204 10:59:04.017365067 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3654604Z 
2025-12-04T11:11:26.3655112Z [W1204 10:59:04.017986550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3655119Z 
2025-12-04T11:11:26.3655619Z [W1204 10:59:04.018187722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3655625Z 
2025-12-04T11:11:26.3656133Z [W1204 10:59:04.024084465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3656139Z 
2025-12-04T11:11:26.3656683Z [W1204 10:59:04.024871088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3656688Z 
2025-12-04T11:11:26.3657189Z [W1204 10:59:04.025058807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3657206Z 
2025-12-04T11:11:26.3657306Z FAILED [0.4681s] [100%]
2025-12-04T11:11:26.3657313Z 
2025-12-04T11:11:26.3657453Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3657963Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3658085Z Traceback (most recent call last):
2025-12-04T11:11:26.3658584Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3658824Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3659276Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3659531Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3660058Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3660259Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3660427Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3660432Z 
2025-12-04T11:11:26.3660536Z Expected 1 but got 2.
2025-12-04T11:11:26.3660644Z Absolute difference: 1
2025-12-04T11:11:26.3660763Z Relative difference: 1.0
2025-12-04T11:11:26.3660796Z 
2025-12-04T11:11:26.3661008Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3661914Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3661919Z 
2025-12-04T11:11:26.3662178Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3662397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3662525Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3663042Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3663280Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3663379Z graph_break []
2025-12-04T11:11:26.3663589Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3664783Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3664896Z   if out == self.unknown_value:
2025-12-04T11:11:26.3665624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3665723Z   warnings.warn(
2025-12-04T11:11:26.3666426Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3666541Z   warnings.warn(
2025-12-04T11:11:26.3667039Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3667188Z Traceback (most recent call last):
2025-12-04T11:11:26.3667685Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3667913Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3668375Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3668542Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3669064Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3669282Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3669411Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3669416Z 
2025-12-04T11:11:26.3669532Z Expected 1 but got 2.
2025-12-04T11:11:26.3669638Z Absolute difference: 1
2025-12-04T11:11:26.3669749Z Relative difference: 1.0
2025-12-04T11:11:26.3669754Z 
2025-12-04T11:11:26.3669978Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3670866Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3670872Z 
2025-12-04T11:11:26.3671216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3671431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3671541Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3672103Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3672327Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3672427Z graph_break []
2025-12-04T11:11:26.3672683Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3673874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3674005Z   if out == self.unknown_value:
2025-12-04T11:11:26.3674724Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3674823Z   warnings.warn(
2025-12-04T11:11:26.3675546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3675646Z   warnings.warn(
2025-12-04T11:11:26.3675874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3675991Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3676216Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3676742Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3676840Z graph_break []
2025-12-04T11:11:26.3677054Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3677774Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3677873Z   warnings.warn(
2025-12-04T11:11:26.3678586Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3678682Z   warnings.warn(
2025-12-04T11:11:26.3678829Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3679342Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3679462Z Traceback (most recent call last):
2025-12-04T11:11:26.3679974Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3680201Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3680649Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3680827Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3681354Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3681619Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3681766Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3681771Z 
2025-12-04T11:11:26.3681872Z Expected 1 but got 2.
2025-12-04T11:11:26.3681990Z Absolute difference: 1
2025-12-04T11:11:26.3682096Z Relative difference: 1.0
2025-12-04T11:11:26.3682101Z 
2025-12-04T11:11:26.3682312Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3683287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3683293Z 
2025-12-04T11:11:26.3683556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3683811Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3683925Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3684444Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3684710Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3684804Z graph_break []
2025-12-04T11:11:26.3685016Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3686219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3686337Z   if out == self.unknown_value:
2025-12-04T11:11:26.3687058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3687161Z   warnings.warn(
2025-12-04T11:11:26.3687867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3687982Z   warnings.warn(
2025-12-04T11:11:26.3688200Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3688326Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3688548Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3689065Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3689175Z graph_break []
2025-12-04T11:11:26.3689384Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3690093Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3690206Z   warnings.warn(
2025-12-04T11:11:26.3690906Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3691015Z   warnings.warn(
2025-12-04T11:11:26.3691224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3691338Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3691571Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3692091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3692201Z graph_break []
2025-12-04T11:11:26.3692411Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3693116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3693226Z   warnings.warn(
2025-12-04T11:11:26.3693928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3694026Z   warnings.warn(
2025-12-04T11:11:26.3694857Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml -
2025-12-04T11:11:26.3695086Z =========================== short test summary info ============================
2025-12-04T11:11:26.3696018Z FAILED [0.4681s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3696053Z 
2025-12-04T11:11:26.3696156Z Expected 1 but got 2.
2025-12-04T11:11:26.3696260Z Absolute difference: 1
2025-12-04T11:11:26.3696380Z Relative difference: 1.0
2025-12-04T11:11:26.3696416Z 
2025-12-04T11:11:26.3696632Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3697538Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3697543Z 
2025-12-04T11:11:26.3697813Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3697990Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3698200Z ================== 1 failed, 10 deselected, 2 rerun in 19.90s ==================
2025-12-04T11:11:26.3698299Z Got exit code 1
2025-12-04T11:11:26.3698418Z Retrying single test...
2025-12-04T11:11:26.3698854Z W1204 10:59:15.431000 89473 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3699494Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml
2025-12-04T11:11:26.3699672Z ============================= test session starts ==============================
2025-12-04T11:11:26.3700015Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3700124Z cachedir: .pytest_cache
2025-12-04T11:11:26.3700645Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3700766Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3701054Z configfile: pytest.ini
2025-12-04T11:11:26.3701587Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3701802Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3702787Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3702903Z Running 1 items in this shard
2025-12-04T11:11:26.3702908Z 
2025-12-04T11:11:26.3704177Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:18.597354727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3704183Z 
2025-12-04T11:11:26.3704691Z [W1204 10:59:34.254683869 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3704698Z 
2025-12-04T11:11:26.3705219Z [W1204 10:59:34.254938694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3705226Z 
2025-12-04T11:11:26.3705726Z [W1204 10:59:34.262074058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3705731Z 
2025-12-04T11:11:26.3706246Z [W1204 10:59:34.262766226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3706251Z 
2025-12-04T11:11:26.3706924Z [W1204 10:59:34.262949376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3706929Z 
2025-12-04T11:11:26.3707430Z [W1204 10:59:34.269618954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3707475Z 
2025-12-04T11:11:26.3707991Z [W1204 10:59:34.270251613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3707996Z 
2025-12-04T11:11:26.3708546Z [W1204 10:59:34.270434854 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3708550Z 
2025-12-04T11:11:26.3709064Z [W1204 10:59:36.209662939 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3709069Z 
2025-12-04T11:11:26.3709574Z [W1204 10:59:36.211414846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3709579Z 
2025-12-04T11:11:26.3710091Z [W1204 10:59:36.211622905 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3710099Z 
2025-12-04T11:11:26.3710595Z [W1204 10:59:36.215549934 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3710599Z 
2025-12-04T11:11:26.3711109Z [W1204 10:59:36.216197946 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3711116Z 
2025-12-04T11:11:26.3711610Z [W1204 10:59:36.216388543 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3711615Z 
2025-12-04T11:11:26.3712113Z [W1204 10:59:36.222483482 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3712132Z 
2025-12-04T11:11:26.3712626Z [W1204 10:59:36.223151602 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3712633Z 
2025-12-04T11:11:26.3713129Z [W1204 10:59:36.223339449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3713133Z 
2025-12-04T11:11:26.3713277Z ('RERUN', {'yellow': True}) [19.4599s] [100%]
2025-12-04T11:11:26.3714528Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:37.645191035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3714533Z 
2025-12-04T11:11:26.3715049Z [W1204 10:59:37.645959842 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3715054Z 
2025-12-04T11:11:26.3715551Z [W1204 10:59:37.646169365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3715558Z 
2025-12-04T11:11:26.3716070Z [W1204 10:59:37.650069536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3716074Z 
2025-12-04T11:11:26.3716574Z [W1204 10:59:37.650862953 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3716581Z 
2025-12-04T11:11:26.3717088Z [W1204 10:59:37.651047734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3717092Z 
2025-12-04T11:11:26.3717654Z [W1204 10:59:37.656932654 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3717658Z 
2025-12-04T11:11:26.3718156Z [W1204 10:59:37.657533138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3718199Z 
2025-12-04T11:11:26.3718699Z [W1204 10:59:37.657713579 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3718704Z 
2025-12-04T11:11:26.3719200Z [W1204 10:59:37.742076908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3719234Z 
2025-12-04T11:11:26.3719746Z [W1204 10:59:37.742823260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3719751Z 
2025-12-04T11:11:26.3720246Z [W1204 10:59:37.743018813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3720255Z 
2025-12-04T11:11:26.3720766Z [W1204 10:59:37.746849396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3720770Z 
2025-12-04T11:11:26.3721267Z [W1204 10:59:37.747460899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3721272Z 
2025-12-04T11:11:26.3721852Z [W1204 10:59:37.747648351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3721865Z 
2025-12-04T11:11:26.3722358Z [W1204 10:59:37.753588698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3722362Z 
2025-12-04T11:11:26.3722858Z [W1204 10:59:37.754408041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3722876Z 
2025-12-04T11:11:26.3723375Z [W1204 10:59:37.754594689 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3723380Z 
2025-12-04T11:11:26.3723506Z ('RERUN', {'yellow': True}) [0.4918s] [100%]
2025-12-04T11:11:26.3724765Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 [W1204 10:59:37.112188009 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3724771Z 
2025-12-04T11:11:26.3725267Z [W1204 10:59:37.112926268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3725271Z 
2025-12-04T11:11:26.3725783Z [W1204 10:59:37.113122062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3725788Z 
2025-12-04T11:11:26.3726285Z [W1204 10:59:37.116958308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3726289Z 
2025-12-04T11:11:26.3726800Z [W1204 10:59:37.117734360 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3726807Z 
2025-12-04T11:11:26.3727305Z [W1204 10:59:37.117923149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3727311Z 
2025-12-04T11:11:26.3727823Z [W1204 10:59:37.123862035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3727827Z 
2025-12-04T11:11:26.3728323Z [W1204 10:59:37.124498010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3728328Z 
2025-12-04T11:11:26.3728893Z [W1204 10:59:37.124681070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3728913Z 
2025-12-04T11:11:26.3729411Z [W1204 10:59:37.209717231 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3729442Z 
2025-12-04T11:11:26.3729941Z [W1204 10:59:37.210508883 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3729973Z 
2025-12-04T11:11:26.3730486Z [W1204 10:59:37.210714155 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3730491Z 
2025-12-04T11:11:26.3730990Z [W1204 10:59:37.214548285 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3730995Z 
2025-12-04T11:11:26.3731509Z [W1204 10:59:37.215170892 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3731514Z 
2025-12-04T11:11:26.3732010Z [W1204 10:59:37.215359696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3732017Z 
2025-12-04T11:11:26.3732529Z [W1204 10:59:37.221283010 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3732535Z 
2025-12-04T11:11:26.3733032Z [W1204 10:59:37.222080794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3733037Z 
2025-12-04T11:11:26.3733547Z [W1204 10:59:37.222278694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3733552Z 
2025-12-04T11:11:26.3733650Z FAILED [0.4656s] [100%]
2025-12-04T11:11:26.3733655Z 
2025-12-04T11:11:26.3733798Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3734305Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3734426Z Traceback (most recent call last):
2025-12-04T11:11:26.3734939Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3735170Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3735623Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3735798Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3736322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3736526Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3736668Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3736673Z 
2025-12-04T11:11:26.3736777Z Expected 1 but got 2.
2025-12-04T11:11:26.3736896Z Absolute difference: 1
2025-12-04T11:11:26.3737005Z Relative difference: 1.0
2025-12-04T11:11:26.3737009Z 
2025-12-04T11:11:26.3737220Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3738119Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3738127Z 
2025-12-04T11:11:26.3738387Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3738618Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3738731Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3739314Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3739553Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3739656Z graph_break []
2025-12-04T11:11:26.3739898Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3741096Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3741257Z   if out == self.unknown_value:
2025-12-04T11:11:26.3741987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3742090Z   warnings.warn(
2025-12-04T11:11:26.3742801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3742915Z   warnings.warn(
2025-12-04T11:11:26.3743413Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3743555Z Traceback (most recent call last):
2025-12-04T11:11:26.3744056Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3744286Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3744747Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3744911Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3745434Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3745654Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3745785Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3745790Z 
2025-12-04T11:11:26.3745910Z Expected 1 but got 2.
2025-12-04T11:11:26.3746018Z Absolute difference: 1
2025-12-04T11:11:26.3746128Z Relative difference: 1.0
2025-12-04T11:11:26.3746132Z 
2025-12-04T11:11:26.3746359Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3747245Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3747252Z 
2025-12-04T11:11:26.3747531Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3747747Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3747861Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3748397Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3748620Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3748730Z graph_break []
2025-12-04T11:11:26.3748942Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3750120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3750246Z   if out == self.unknown_value:
2025-12-04T11:11:26.3750957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3751056Z   warnings.warn(
2025-12-04T11:11:26.3751835Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3751937Z   warnings.warn(
2025-12-04T11:11:26.3752164Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3752306Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3752529Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3753061Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3753184Z graph_break []
2025-12-04T11:11:26.3753404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3754115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3754218Z   warnings.warn(
2025-12-04T11:11:26.3754940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3755039Z   warnings.warn(
2025-12-04T11:11:26.3755178Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3755690Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3755810Z Traceback (most recent call last):
2025-12-04T11:11:26.3756318Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3756543Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3756990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3757167Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3757691Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3757906Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3758035Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3758040Z 
2025-12-04T11:11:26.3758142Z Expected 1 but got 2.
2025-12-04T11:11:26.3758261Z Absolute difference: 1
2025-12-04T11:11:26.3758368Z Relative difference: 1.0
2025-12-04T11:11:26.3758375Z 
2025-12-04T11:11:26.3758581Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3759480Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3759485Z 
2025-12-04T11:11:26.3759755Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3759978Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3760091Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3760608Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3760841Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3760937Z graph_break []
2025-12-04T11:11:26.3761162Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3762408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3762527Z   if out == self.unknown_value:
2025-12-04T11:11:26.3763322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3763423Z   warnings.warn(
2025-12-04T11:11:26.3764140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3764270Z   warnings.warn(
2025-12-04T11:11:26.3764531Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3764686Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3764908Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3765422Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3765530Z graph_break []
2025-12-04T11:11:26.3765745Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3766471Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3766572Z   warnings.warn(
2025-12-04T11:11:26.3767276Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3767387Z   warnings.warn(
2025-12-04T11:11:26.3767601Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3767713Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3767945Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3768458Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3768565Z graph_break []
2025-12-04T11:11:26.3768779Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3769487Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3769601Z   warnings.warn(
2025-12-04T11:11:26.3770301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3770414Z   warnings.warn(
2025-12-04T11:11:26.3771234Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml -
2025-12-04T11:11:26.3771401Z =========================== short test summary info ============================
2025-12-04T11:11:26.3772337Z FAILED [0.4656s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3772343Z 
2025-12-04T11:11:26.3772445Z Expected 1 but got 2.
2025-12-04T11:11:26.3772561Z Absolute difference: 1
2025-12-04T11:11:26.3772671Z Relative difference: 1.0
2025-12-04T11:11:26.3772676Z 
2025-12-04T11:11:26.3772887Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3773782Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3773789Z 
2025-12-04T11:11:26.3774050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3774242Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3774435Z ================== 1 failed, 10 deselected, 2 rerun in 20.45s ==================
2025-12-04T11:11:26.3774606Z Got exit code 1
2025-12-04T11:11:26.3775422Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3775857Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.3776289Z W1204 10:59:48.576000 89654 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3776979Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml
2025-12-04T11:11:26.3777141Z ============================= test session starts ==============================
2025-12-04T11:11:26.3777495Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3777602Z cachedir: .pytest_cache
2025-12-04T11:11:26.3778117Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3778252Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3778357Z configfile: pytest.ini
2025-12-04T11:11:26.3778900Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3779109Z collecting ... collected 58 items / 3 deselected / 55 selected
2025-12-04T11:11:26.3779248Z stepcurrent: skipping 3 already run items.
2025-12-04T11:11:26.3779374Z Running 8 items in this shard
2025-12-04T11:11:26.3779380Z 
2025-12-04T11:11:26.3780229Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8029s] [ 12%]
2025-12-04T11:11:26.3781085Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4767s] [ 12%]
2025-12-04T11:11:26.3781838Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 FAILED [0.4701s] [ 12%]
2025-12-04T11:11:26.3781846Z 
2025-12-04T11:11:26.3781981Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3782487Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3782608Z Traceback (most recent call last):
2025-12-04T11:11:26.3783119Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3783343Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3783796Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3783969Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3784496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3784699Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3784842Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3784847Z 
2025-12-04T11:11:26.3784951Z Expected 1 but got 2.
2025-12-04T11:11:26.3785067Z Absolute difference: 1
2025-12-04T11:11:26.3785171Z Relative difference: 1.0
2025-12-04T11:11:26.3785175Z 
2025-12-04T11:11:26.3785385Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3786286Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3786357Z 
2025-12-04T11:11:26.3786615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3786840Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3787019Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3787536Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3787774Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3787899Z graph_break []
2025-12-04T11:11:26.3788112Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3788842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3788939Z   warnings.warn(
2025-12-04T11:11:26.3789667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3789767Z   warnings.warn(
2025-12-04T11:11:26.3790263Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3790400Z Traceback (most recent call last):
2025-12-04T11:11:26.3790895Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3791138Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3791588Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3791750Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3792292Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3792500Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3792628Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3792648Z 
2025-12-04T11:11:26.3792754Z Expected 1 but got 2.
2025-12-04T11:11:26.3792862Z Absolute difference: 1
2025-12-04T11:11:26.3792984Z Relative difference: 1.0
2025-12-04T11:11:26.3792989Z 
2025-12-04T11:11:26.3793201Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3794092Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3794099Z 
2025-12-04T11:11:26.3794375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3794589Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3794721Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3795242Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3795467Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3795581Z graph_break []
2025-12-04T11:11:26.3795794Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3796523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3796622Z   warnings.warn(
2025-12-04T11:11:26.3797330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3797439Z   warnings.warn(
2025-12-04T11:11:26.3797653Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3797826Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3798067Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3798583Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3798724Z graph_break []
2025-12-04T11:11:26.3798934Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3799997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3800188Z   warnings.warn(
2025-12-04T11:11:26.3801212Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3801345Z   warnings.warn(
2025-12-04T11:11:26.3801596Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3802164Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3802381Z Traceback (most recent call last):
2025-12-04T11:11:26.3802986Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3803260Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3803758Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3804001Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3804538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3804879Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3805060Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3805065Z 
2025-12-04T11:11:26.3805204Z Expected 1 but got 2.
2025-12-04T11:11:26.3805392Z Absolute difference: 1
2025-12-04T11:11:26.3805537Z Relative difference: 1.0
2025-12-04T11:11:26.3805548Z 
2025-12-04T11:11:26.3805772Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3806796Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3806804Z 
2025-12-04T11:11:26.3807112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3807411Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3807560Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3808115Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3808434Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3808597Z graph_break []
2025-12-04T11:11:26.3808900Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3809650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3809787Z   warnings.warn(
2025-12-04T11:11:26.3810547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3810714Z   warnings.warn(
2025-12-04T11:11:26.3811037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3811184Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3811582Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3812180Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3812347Z graph_break []
2025-12-04T11:11:26.3812631Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3813439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3813617Z   warnings.warn(
2025-12-04T11:11:26.3814398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3814542Z   warnings.warn(
2025-12-04T11:11:26.3814767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3815020Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3815281Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3815876Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3816018Z graph_break []
2025-12-04T11:11:26.3816262Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3817063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3817217Z   warnings.warn(
2025-12-04T11:11:26.3817956Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3818133Z   warnings.warn(
2025-12-04T11:11:26.3819005Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml -
2025-12-04T11:11:26.3819234Z =========================== short test summary info ============================
2025-12-04T11:11:26.3820241Z FAILED [0.4701s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3820249Z 
2025-12-04T11:11:26.3820458Z Expected 1 but got 2.
2025-12-04T11:11:26.3820599Z Absolute difference: 1
2025-12-04T11:11:26.3820749Z Relative difference: 1.0
2025-12-04T11:11:26.3820754Z 
2025-12-04T11:11:26.3821044Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3821937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3821948Z 
2025-12-04T11:11:26.3822351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3822563Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3822794Z =================== 1 failed, 3 deselected, 2 rerun in 4.78s ===================
2025-12-04T11:11:26.3822981Z Got exit code 1
2025-12-04T11:11:26.3823117Z Retrying single test...
2025-12-04T11:11:26.3823564Z W1204 11:00:08.194000 89830 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3824342Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml
2025-12-04T11:11:26.3824537Z ============================= test session starts ==============================
2025-12-04T11:11:26.3824968Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3825176Z cachedir: .pytest_cache
2025-12-04T11:11:26.3825724Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3825972Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3826143Z configfile: pytest.ini
2025-12-04T11:11:26.3826756Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3827008Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3828045Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3828220Z Running 1 items in this shard
2025-12-04T11:11:26.3828225Z 
2025-12-04T11:11:26.3829546Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:11.376811483 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3829554Z 
2025-12-04T11:11:26.3830179Z [W1204 11:00:27.915699899 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3830185Z 
2025-12-04T11:11:26.3830728Z [W1204 11:00:27.915955422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3830735Z 
2025-12-04T11:11:26.3831313Z [W1204 11:00:27.923070450 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3831318Z 
2025-12-04T11:11:26.3831848Z [W1204 11:00:27.923778594 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3831857Z 
2025-12-04T11:11:26.3832462Z [W1204 11:00:27.923963333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3832467Z 
2025-12-04T11:11:26.3833020Z [W1204 11:00:27.930633003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3833025Z 
2025-12-04T11:11:26.3833558Z [W1204 11:00:27.931289807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3833613Z 
2025-12-04T11:11:26.3834143Z [W1204 11:00:27.931476942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3834148Z 
2025-12-04T11:11:26.3834681Z [W1204 11:00:29.868191998 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3834685Z 
2025-12-04T11:11:26.3835247Z [W1204 11:00:29.869872704 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3835251Z 
2025-12-04T11:11:26.3835830Z [W1204 11:00:29.870099992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3835835Z 
2025-12-04T11:11:26.3836430Z [W1204 11:00:29.873871391 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3836438Z 
2025-12-04T11:11:26.3836973Z [W1204 11:00:29.874493890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3836977Z 
2025-12-04T11:11:26.3837553Z [W1204 11:00:29.874682459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3837558Z 
2025-12-04T11:11:26.3838161Z [W1204 11:00:29.880505725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3838166Z 
2025-12-04T11:11:26.3838757Z [W1204 11:00:29.881113208 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3838790Z 
2025-12-04T11:11:26.3839340Z [W1204 11:00:29.881300893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3839371Z 
2025-12-04T11:11:26.3839535Z ('RERUN', {'yellow': True}) [19.3569s] [100%]
2025-12-04T11:11:26.3840865Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:29.304864586 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3840870Z 
2025-12-04T11:11:26.3841407Z [W1204 11:00:29.305609619 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3841411Z 
2025-12-04T11:11:26.3842070Z [W1204 11:00:29.305810537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3842081Z 
2025-12-04T11:11:26.3842656Z [W1204 11:00:29.309631300 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3842663Z 
2025-12-04T11:11:26.3843265Z [W1204 11:00:29.310449639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3843270Z 
2025-12-04T11:11:26.3843801Z [W1204 11:00:29.310647500 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3843806Z 
2025-12-04T11:11:26.3844403Z [W1204 11:00:29.316496260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3844409Z 
2025-12-04T11:11:26.3844944Z [W1204 11:00:29.317127118 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3844951Z 
2025-12-04T11:11:26.3845547Z [W1204 11:00:29.317313857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3845554Z 
2025-12-04T11:11:26.3846103Z [W1204 11:00:29.400768315 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3846108Z 
2025-12-04T11:11:26.3846698Z [W1204 11:00:29.401536254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3846703Z 
2025-12-04T11:11:26.3847238Z [W1204 11:00:29.401737880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3847242Z 
2025-12-04T11:11:26.3847775Z [W1204 11:00:29.405561958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3847782Z 
2025-12-04T11:11:26.3848346Z [W1204 11:00:29.406183261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3848351Z 
2025-12-04T11:11:26.3848928Z [W1204 11:00:29.406373932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3848932Z 
2025-12-04T11:11:26.3849530Z [W1204 11:00:29.412188351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3849535Z 
2025-12-04T11:11:26.3850144Z [W1204 11:00:29.412963014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3850150Z 
2025-12-04T11:11:26.3850729Z [W1204 11:00:29.413161970 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3850760Z 
2025-12-04T11:11:26.3850927Z ('RERUN', {'yellow': True}) [0.4942s] [100%]
2025-12-04T11:11:26.3852266Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:30.780534137 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3852301Z 
2025-12-04T11:11:26.3852870Z [W1204 11:00:30.781278196 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3852875Z 
2025-12-04T11:11:26.3853465Z [W1204 11:00:30.781472746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3853470Z 
2025-12-04T11:11:26.3854000Z [W1204 11:00:30.785269993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3854006Z 
2025-12-04T11:11:26.3854537Z [W1204 11:00:30.786038368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3854590Z 
2025-12-04T11:11:26.3855106Z [W1204 11:00:30.786234660 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3855111Z 
2025-12-04T11:11:26.3855686Z [W1204 11:00:30.792049527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3855691Z 
2025-12-04T11:11:26.3856293Z [W1204 11:00:30.792661229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3856297Z 
2025-12-04T11:11:26.3856831Z [W1204 11:00:30.792842829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3856838Z 
2025-12-04T11:11:26.3857484Z [W1204 11:00:30.877400876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3857490Z 
2025-12-04T11:11:26.3858132Z [W1204 11:00:30.878168999 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3858142Z 
2025-12-04T11:11:26.3858795Z [W1204 11:00:30.878370694 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3858803Z 
2025-12-04T11:11:26.3859409Z [W1204 11:00:30.882226433 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3859415Z 
2025-12-04T11:11:26.3860086Z [W1204 11:00:30.882866163 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3860096Z 
2025-12-04T11:11:26.3860628Z [W1204 11:00:30.883057308 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3860633Z 
2025-12-04T11:11:26.3861162Z [W1204 11:00:30.888899778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3861216Z 
2025-12-04T11:11:26.3861745Z [W1204 11:00:30.889700015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3861750Z 
2025-12-04T11:11:26.3862327Z [W1204 11:00:30.889890767 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3862407Z 
2025-12-04T11:11:26.3862609Z FAILED [0.4741s] [100%]
2025-12-04T11:11:26.3862614Z 
2025-12-04T11:11:26.3862797Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.3863421Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3863580Z Traceback (most recent call last):
2025-12-04T11:11:26.3864094Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3864501Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3864995Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3865248Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3865811Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3866052Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3866288Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3866294Z 
2025-12-04T11:11:26.3866453Z Expected 1 but got 2.
2025-12-04T11:11:26.3866652Z Absolute difference: 1
2025-12-04T11:11:26.3866794Z Relative difference: 1.0
2025-12-04T11:11:26.3866800Z 
2025-12-04T11:11:26.3867042Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3867989Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3867995Z 
2025-12-04T11:11:26.3868325Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3868647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3868810Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3869365Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3869666Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3869778Z graph_break []
2025-12-04T11:11:26.3870062Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3871362Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3871513Z   if out == self.unknown_value:
2025-12-04T11:11:26.3872306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3872443Z   warnings.warn(
2025-12-04T11:11:26.3873159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3873397Z   warnings.warn(
2025-12-04T11:11:26.3873938Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3874140Z Traceback (most recent call last):
2025-12-04T11:11:26.3874668Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3874933Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3875472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3875700Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3876372Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3876611Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3876773Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3876807Z 
2025-12-04T11:11:26.3876973Z Expected 1 but got 2.
2025-12-04T11:11:26.3877156Z Absolute difference: 1
2025-12-04T11:11:26.3889821Z Relative difference: 1.0
2025-12-04T11:11:26.3889830Z 
2025-12-04T11:11:26.3890065Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3891115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3891121Z 
2025-12-04T11:11:26.3891389Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3891622Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3891753Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3892278Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3892520Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3892623Z graph_break []
2025-12-04T11:11:26.3892838Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3894043Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3894159Z   if out == self.unknown_value:
2025-12-04T11:11:26.3894885Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3894988Z   warnings.warn(
2025-12-04T11:11:26.3895693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3895805Z   warnings.warn(
2025-12-04T11:11:26.3896021Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3896134Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3896371Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3896891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3897004Z graph_break []
2025-12-04T11:11:26.3897216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3897930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3898045Z   warnings.warn(
2025-12-04T11:11:26.3898743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3898859Z   warnings.warn(
2025-12-04T11:11:26.3898999Z =================================== FAILURES ===================================
2025-12-04T11:11:26.3899495Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.3899629Z Traceback (most recent call last):
2025-12-04T11:11:26.3900126Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.3900352Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.3901183Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.3901351Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.3901894Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.3902151Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.3902280Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3902285Z 
2025-12-04T11:11:26.3902451Z Expected 1 but got 2.
2025-12-04T11:11:26.3902557Z Absolute difference: 1
2025-12-04T11:11:26.3902665Z Relative difference: 1.0
2025-12-04T11:11:26.3902684Z 
2025-12-04T11:11:26.3902895Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3903780Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3903791Z 
2025-12-04T11:11:26.3904065Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3904282Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3904396Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3904927Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3905150Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3905260Z graph_break []
2025-12-04T11:11:26.3905471Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3906654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.3906784Z   if out == self.unknown_value:
2025-12-04T11:11:26.3907499Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3907611Z   warnings.warn(
2025-12-04T11:11:26.3908314Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3908413Z   warnings.warn(
2025-12-04T11:11:26.3908640Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3908750Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3908971Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3909495Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3909598Z graph_break []
2025-12-04T11:11:26.3909820Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3910525Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3910624Z   warnings.warn(
2025-12-04T11:11:26.3911341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3911442Z   warnings.warn(
2025-12-04T11:11:26.3911667Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.3911779Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.3912001Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.3912586Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.3912682Z graph_break []
2025-12-04T11:11:26.3912891Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.3913610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3913736Z   warnings.warn(
2025-12-04T11:11:26.3914452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.3914578Z   warnings.warn(
2025-12-04T11:11:26.3915395Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml -
2025-12-04T11:11:26.3915574Z =========================== short test summary info ============================
2025-12-04T11:11:26.3916499Z FAILED [0.4741s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.3916507Z 
2025-12-04T11:11:26.3916623Z Expected 1 but got 2.
2025-12-04T11:11:26.3916726Z Absolute difference: 1
2025-12-04T11:11:26.3916833Z Relative difference: 1.0
2025-12-04T11:11:26.3916838Z 
2025-12-04T11:11:26.3917062Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.3917952Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3919005Z 
2025-12-04T11:11:26.3919269Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.3919852Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.3920353Z ================== 1 failed, 10 deselected, 2 rerun in 20.36s ==================
2025-12-04T11:11:26.3920785Z Got exit code 1
2025-12-04T11:11:26.3921049Z Retrying single test...
2025-12-04T11:11:26.3921743Z W1204 11:00:41.304000 90011 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.3922962Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml
2025-12-04T11:11:26.3923918Z ============================= test session starts ==============================
2025-12-04T11:11:26.3924573Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.3925153Z cachedir: .pytest_cache
2025-12-04T11:11:26.3925851Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.3926650Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.3927036Z configfile: pytest.ini
2025-12-04T11:11:26.3927766Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.3928654Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.3929969Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.3931164Z Running 1 items in this shard
2025-12-04T11:11:26.3931388Z 
2025-12-04T11:11:26.3932646Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:00:44.479874186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3934131Z 
2025-12-04T11:11:26.3934644Z [W1204 11:01:00.154424059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3935290Z 
2025-12-04T11:11:26.3935850Z [W1204 11:01:00.154673944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3936483Z 
2025-12-04T11:11:26.3937000Z [W1204 11:01:00.161921944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3937665Z 
2025-12-04T11:11:26.3938169Z [W1204 11:01:00.162664746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3938817Z 
2025-12-04T11:11:26.3939325Z [W1204 11:01:00.162851385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3939972Z 
2025-12-04T11:11:26.3940477Z [W1204 11:01:00.169598287 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3941128Z 
2025-12-04T11:11:26.3941634Z [W1204 11:01:00.170321879 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3942264Z 
2025-12-04T11:11:26.3942782Z [W1204 11:01:00.170516066 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3943418Z 
2025-12-04T11:11:26.3943923Z [W1204 11:01:02.114323524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3944575Z 
2025-12-04T11:11:26.3945080Z [W1204 11:01:02.116021614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3945731Z 
2025-12-04T11:11:26.3946236Z [W1204 11:01:02.116223485 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3946887Z 
2025-12-04T11:11:26.3947391Z [W1204 11:01:02.120059755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3948030Z 
2025-12-04T11:11:26.3948545Z [W1204 11:01:02.120684373 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3949188Z 
2025-12-04T11:11:26.3949708Z [W1204 11:01:02.120871428 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3950343Z 
2025-12-04T11:11:26.3950850Z [W1204 11:01:02.126763349 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3951501Z 
2025-12-04T11:11:26.3952017Z [W1204 11:01:02.127366744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3952665Z 
2025-12-04T11:11:26.3953168Z [W1204 11:01:02.127553255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3953803Z 
2025-12-04T11:11:26.3953952Z ('RERUN', {'yellow': True}) [19.4920s] [100%]
2025-12-04T11:11:26.3955449Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:01:02.564391588 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3956846Z 
2025-12-04T11:11:26.3957350Z [W1204 11:01:02.565143641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3958001Z 
2025-12-04T11:11:26.3958590Z [W1204 11:01:02.565335597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3959296Z 
2025-12-04T11:11:26.3959807Z [W1204 11:01:02.569124949 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3960480Z 
2025-12-04T11:11:26.3961003Z [W1204 11:01:02.569899636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3961742Z 
2025-12-04T11:11:26.3962265Z [W1204 11:01:02.570106817 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3962898Z 
2025-12-04T11:11:26.3963401Z [W1204 11:01:02.576024100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3964049Z 
2025-12-04T11:11:26.3964554Z [W1204 11:01:02.576624696 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3965208Z 
2025-12-04T11:11:26.3965714Z [W1204 11:01:02.576804818 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3966360Z 
2025-12-04T11:11:26.3966880Z [W1204 11:01:03.661656996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3967515Z 
2025-12-04T11:11:26.3968034Z [W1204 11:01:03.662423732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3968669Z 
2025-12-04T11:11:26.3969176Z [W1204 11:01:03.662621634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3969822Z 
2025-12-04T11:11:26.3970327Z [W1204 11:01:03.666419258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3970975Z 
2025-12-04T11:11:26.3971476Z [W1204 11:01:03.667050083 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3972115Z 
2025-12-04T11:11:26.3972637Z [W1204 11:01:03.667238592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3973271Z 
2025-12-04T11:11:26.3973787Z [W1204 11:01:03.673143161 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3974430Z 
2025-12-04T11:11:26.3974935Z [W1204 11:01:03.673977344 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3975585Z 
2025-12-04T11:11:26.3976091Z [W1204 11:01:03.674179117 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3976738Z 
2025-12-04T11:11:26.3976872Z ('RERUN', {'yellow': True}) [0.5083s] [100%]
2025-12-04T11:11:26.3978390Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 [W1204 11:01:03.045772888 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3979775Z 
2025-12-04T11:11:26.3980292Z [W1204 11:01:03.046535662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3980934Z 
2025-12-04T11:11:26.3981448Z [W1204 11:01:03.046729343 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3982083Z 
2025-12-04T11:11:26.3982670Z [W1204 11:01:03.050556942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3983324Z 
2025-12-04T11:11:26.3983824Z [W1204 11:01:03.051335059 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3984503Z 
2025-12-04T11:11:26.3985009Z [W1204 11:01:03.051519408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3985648Z 
2025-12-04T11:11:26.3986171Z [W1204 11:01:03.057381781 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3986838Z 
2025-12-04T11:11:26.3987365Z [W1204 11:01:03.057986486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3988006Z 
2025-12-04T11:11:26.3988505Z [W1204 11:01:03.058180921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3989155Z 
2025-12-04T11:11:26.3989657Z [W1204 11:01:03.144682376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3990309Z 
2025-12-04T11:11:26.3990817Z [W1204 11:01:03.145459685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3991451Z 
2025-12-04T11:11:26.3991968Z [W1204 11:01:03.145662803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3992609Z 
2025-12-04T11:11:26.3993121Z [W1204 11:01:03.149534778 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3993754Z 
2025-12-04T11:11:26.3994255Z [W1204 11:01:03.150199261 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3994901Z 
2025-12-04T11:11:26.3995409Z [W1204 11:01:03.150393747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3996060Z 
2025-12-04T11:11:26.3996559Z [W1204 11:01:03.156275070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3997192Z 
2025-12-04T11:11:26.3997706Z [W1204 11:01:03.157071345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3998342Z 
2025-12-04T11:11:26.3998860Z [W1204 11:01:03.157260667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.3999498Z 
2025-12-04T11:11:26.3999596Z FAILED [0.4830s] [100%]
2025-12-04T11:11:26.3999779Z 
2025-12-04T11:11:26.3999917Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4000708Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4001712Z Traceback (most recent call last):
2025-12-04T11:11:26.4002444Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4003318Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4004146Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4004902Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4005712Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4006579Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4007046Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4007298Z 
2025-12-04T11:11:26.4007565Z Expected 1 but got 2.
2025-12-04T11:11:26.4007852Z Absolute difference: 1
2025-12-04T11:11:26.4008144Z Relative difference: 1.0
2025-12-04T11:11:26.4008330Z 
2025-12-04T11:11:26.4008534Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4009815Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4010850Z 
2025-12-04T11:11:26.4011156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4011779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4012236Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4012971Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4013859Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4014321Z graph_break []
2025-12-04T11:11:26.4014677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4016224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4017656Z   if out == self.unknown_value:
2025-12-04T11:11:26.4018597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4019537Z   warnings.warn(
2025-12-04T11:11:26.4020407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4021360Z   warnings.warn(
2025-12-04T11:11:26.4022016Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4022758Z Traceback (most recent call last):
2025-12-04T11:11:26.4023491Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4024358Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4025157Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4025907Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4026718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4027582Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4028033Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4028295Z 
2025-12-04T11:11:26.4028397Z Expected 1 but got 2.
2025-12-04T11:11:26.4028679Z Absolute difference: 1
2025-12-04T11:11:26.4028960Z Relative difference: 1.0
2025-12-04T11:11:26.4029157Z 
2025-12-04T11:11:26.4029367Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4030591Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4031607Z 
2025-12-04T11:11:26.4031885Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4032491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4032958Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4033692Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4034645Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4035112Z graph_break []
2025-12-04T11:11:26.4035486Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4037058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4038520Z   if out == self.unknown_value:
2025-12-04T11:11:26.4039464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4040419Z   warnings.warn(
2025-12-04T11:11:26.4041293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4042298Z   warnings.warn(
2025-12-04T11:11:26.4042678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4043156Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4043584Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4044471Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4045227Z graph_break []
2025-12-04T11:11:26.4045600Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4046662Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4047608Z   warnings.warn(
2025-12-04T11:11:26.4048481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4049412Z   warnings.warn(
2025-12-04T11:11:26.4049723Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4050512Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4051282Z Traceback (most recent call last):
2025-12-04T11:11:26.4052006Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4052874Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4053685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4054433Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4055242Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4056103Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4056567Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4056815Z 
2025-12-04T11:11:26.4056918Z Expected 1 but got 2.
2025-12-04T11:11:26.4057205Z Absolute difference: 1
2025-12-04T11:11:26.4057498Z Relative difference: 1.0
2025-12-04T11:11:26.4057682Z 
2025-12-04T11:11:26.4057905Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4059136Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4060164Z 
2025-12-04T11:11:26.4060427Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4061043Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4061615Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4062339Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4063253Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4063716Z graph_break []
2025-12-04T11:11:26.4064071Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4065607Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4067065Z   if out == self.unknown_value:
2025-12-04T11:11:26.4067989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4068931Z   warnings.warn(
2025-12-04T11:11:26.4069801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4070751Z   warnings.warn(
2025-12-04T11:11:26.4071126Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4071584Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4072031Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4072915Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4073654Z graph_break []
2025-12-04T11:11:26.4074024Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4075103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4076047Z   warnings.warn(
2025-12-04T11:11:26.4076895Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4077832Z   warnings.warn(
2025-12-04T11:11:26.4078202Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4078654Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4079091Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4079963Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4080712Z graph_break []
2025-12-04T11:11:26.4081064Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4082211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4083163Z   warnings.warn(
2025-12-04T11:11:26.4084029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4084959Z   warnings.warn(
2025-12-04T11:11:26.4085941Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml -
2025-12-04T11:11:26.4087068Z =========================== short test summary info ============================
2025-12-04T11:11:26.4088300Z FAILED [0.4830s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4089336Z 
2025-12-04T11:11:26.4089535Z Expected 1 but got 2.
2025-12-04T11:11:26.4089820Z Absolute difference: 1
2025-12-04T11:11:26.4090111Z Relative difference: 1.0
2025-12-04T11:11:26.4090296Z 
2025-12-04T11:11:26.4090504Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4091785Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4092935Z 
2025-12-04T11:11:26.4093199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4093776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4094274Z ================== 1 failed, 10 deselected, 2 rerun in 20.52s ==================
2025-12-04T11:11:26.4094709Z Got exit code 1
2025-12-04T11:11:26.4095682Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4097024Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.4097998Z W1204 11:01:14.888000 90193 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4099218Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml
2025-12-04T11:11:26.4100177Z ============================= test session starts ==============================
2025-12-04T11:11:26.4101000Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4101579Z cachedir: .pytest_cache
2025-12-04T11:11:26.4102272Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4103048Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4103381Z configfile: pytest.ini
2025-12-04T11:11:26.4104093Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4104979Z collecting ... collected 58 items / 4 deselected / 54 selected
2025-12-04T11:11:26.4105460Z stepcurrent: skipping 4 already run items.
2025-12-04T11:11:26.4105824Z Running 7 items in this shard
2025-12-04T11:11:26.4106046Z 
2025-12-04T11:11:26.4106888Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7669s] [ 14%]
2025-12-04T11:11:26.4108691Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4444s] [ 14%]
2025-12-04T11:11:26.4110416Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 FAILED [0.4406s] [ 14%]
2025-12-04T11:11:26.4111304Z 
2025-12-04T11:11:26.4111460Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4112222Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4112970Z Traceback (most recent call last):
2025-12-04T11:11:26.4113709Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4114565Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4115380Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4116134Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4117102Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4117963Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4118489Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4118735Z 
2025-12-04T11:11:26.4118855Z Expected 1 but got 2.
2025-12-04T11:11:26.4119132Z Absolute difference: 1
2025-12-04T11:11:26.4119428Z Relative difference: 1.0
2025-12-04T11:11:26.4119675Z 
2025-12-04T11:11:26.4119884Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4121115Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4122204Z 
2025-12-04T11:11:26.4122469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4123096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4123572Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4124662Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4125891Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4126351Z graph_break []
2025-12-04T11:11:26.4126714Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4127786Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4128722Z   warnings.warn(
2025-12-04T11:11:26.4129602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4130543Z   warnings.warn(
2025-12-04T11:11:26.4131181Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4131928Z Traceback (most recent call last):
2025-12-04T11:11:26.4132662Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4133530Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4134333Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4135079Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4135903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4136781Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4137236Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4137495Z 
2025-12-04T11:11:26.4137600Z Expected 1 but got 2.
2025-12-04T11:11:26.4137884Z Absolute difference: 1
2025-12-04T11:11:26.4138166Z Relative difference: 1.0
2025-12-04T11:11:26.4138362Z 
2025-12-04T11:11:26.4138569Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4139792Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4140797Z 
2025-12-04T11:11:26.4141072Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4141674Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4142137Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4143312Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4144542Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4145043Z graph_break []
2025-12-04T11:11:26.4145414Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4146493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4147470Z   warnings.warn(
2025-12-04T11:11:26.4148349Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4149297Z   warnings.warn(
2025-12-04T11:11:26.4149673Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4150138Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4150572Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4151799Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4152905Z graph_break []
2025-12-04T11:11:26.4153255Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4154323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4155265Z   warnings.warn(
2025-12-04T11:11:26.4156115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4157063Z   warnings.warn(
2025-12-04T11:11:26.4157368Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4158144Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4158879Z Traceback (most recent call last):
2025-12-04T11:11:26.4159610Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4160471Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4161269Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4162103Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4162927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4163801Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4164259Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4164519Z 
2025-12-04T11:11:26.4164621Z Expected 1 but got 2.
2025-12-04T11:11:26.4164903Z Absolute difference: 1
2025-12-04T11:11:26.4165174Z Relative difference: 1.0
2025-12-04T11:11:26.4165370Z 
2025-12-04T11:11:26.4165578Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4166803Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4167811Z 
2025-12-04T11:11:26.4168083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4168688Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4169155Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4170367Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4171624Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4172076Z graph_break []
2025-12-04T11:11:26.4172447Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4173520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4174507Z   warnings.warn(
2025-12-04T11:11:26.4175363Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4176307Z   warnings.warn(
2025-12-04T11:11:26.4176683Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4177143Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4177585Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4178814Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4179915Z graph_break []
2025-12-04T11:11:26.4180279Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4181352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4182301Z   warnings.warn(
2025-12-04T11:11:26.4183178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4184111Z   warnings.warn(
2025-12-04T11:11:26.4184484Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4184952Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4185380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4186622Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4187734Z graph_break []
2025-12-04T11:11:26.4188107Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4189170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4190118Z   warnings.warn(
2025-12-04T11:11:26.4190996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4191938Z   warnings.warn(
2025-12-04T11:11:26.4192912Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml -
2025-12-04T11:11:26.4194034Z =========================== short test summary info ============================
2025-12-04T11:11:26.4195264Z FAILED [0.4406s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4196288Z 
2025-12-04T11:11:26.4196402Z Expected 1 but got 2.
2025-12-04T11:11:26.4196675Z Absolute difference: 1
2025-12-04T11:11:26.4196968Z Relative difference: 1.0
2025-12-04T11:11:26.4197154Z 
2025-12-04T11:11:26.4197461Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4198672Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4199724Z 
2025-12-04T11:11:26.4199983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4200564Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4201309Z =================== 1 failed, 4 deselected, 2 rerun in 4.68s ===================
2025-12-04T11:11:26.4201789Z Got exit code 1
2025-12-04T11:11:26.4202052Z Retrying single test...
2025-12-04T11:11:26.4202671Z W1204 11:01:34.951000 90362 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4203892Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml
2025-12-04T11:11:26.4204829Z ============================= test session starts ==============================
2025-12-04T11:11:26.4205480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4206074Z cachedir: .pytest_cache
2025-12-04T11:11:26.4206764Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4207517Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4207863Z configfile: pytest.ini
2025-12-04T11:11:26.4208581Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4209448Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4210762Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4211959Z Running 1 items in this shard
2025-12-04T11:11:26.4212164Z 
2025-12-04T11:11:26.4213412Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:40.977822627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4214786Z 
2025-12-04T11:11:26.4215310Z [W1204 11:01:55.001878862 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4215953Z 
2025-12-04T11:11:26.4216455Z [W1204 11:01:55.002137378 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4217099Z 
2025-12-04T11:11:26.4217605Z [W1204 11:01:55.009507138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4218250Z 
2025-12-04T11:11:26.4218756Z [W1204 11:01:55.010339312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4219390Z 
2025-12-04T11:11:26.4219904Z [W1204 11:01:55.010542993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4220537Z 
2025-12-04T11:11:26.4221055Z [W1204 11:01:55.017517850 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4221685Z 
2025-12-04T11:11:26.4222189Z [W1204 11:01:55.018375575 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4222835Z 
2025-12-04T11:11:26.4223495Z [W1204 11:01:55.018561389 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4224152Z 
2025-12-04T11:11:26.4224657Z [W1204 11:01:55.155353990 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4225340Z 
2025-12-04T11:11:26.4225863Z [W1204 11:01:55.157186180 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4226501Z 
2025-12-04T11:11:26.4227061Z [W1204 11:01:55.157397471 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4227697Z 
2025-12-04T11:11:26.4228198Z [W1204 11:01:55.161541591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4228846Z 
2025-12-04T11:11:26.4229351Z [W1204 11:01:55.162255472 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4229997Z 
2025-12-04T11:11:26.4230497Z [W1204 11:01:55.162449046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4231135Z 
2025-12-04T11:11:26.4231648Z [W1204 11:01:55.168593685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4232280Z 
2025-12-04T11:11:26.4232795Z [W1204 11:01:55.169355035 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4233429Z 
2025-12-04T11:11:26.4233930Z [W1204 11:01:55.169553439 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4234580Z 
2025-12-04T11:11:26.4234711Z ('RERUN', {'yellow': True}) [18.8635s] [100%]
2025-12-04T11:11:26.4236210Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:55.571798181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4237582Z 
2025-12-04T11:11:26.4238086Z [W1204 11:01:55.572551053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4238720Z 
2025-12-04T11:11:26.4239234Z [W1204 11:01:55.572751573 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4239873Z 
2025-12-04T11:11:26.4240383Z [W1204 11:01:55.576698070 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4241016Z 
2025-12-04T11:11:26.4241611Z [W1204 11:01:55.577314176 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4242265Z 
2025-12-04T11:11:26.4242765Z [W1204 11:01:55.577500249 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4243415Z 
2025-12-04T11:11:26.4243915Z [W1204 11:01:55.583615855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4244548Z 
2025-12-04T11:11:26.4245062Z [W1204 11:01:55.584260679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4245696Z 
2025-12-04T11:11:26.4246215Z [W1204 11:01:55.584443541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4246848Z 
2025-12-04T11:11:26.4247346Z [W1204 11:01:56.671311198 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4248087Z 
2025-12-04T11:11:26.4248586Z [W1204 11:01:56.672069008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4249229Z 
2025-12-04T11:11:26.4249765Z [W1204 11:01:56.672266218 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4250398Z 
2025-12-04T11:11:26.4250910Z [W1204 11:01:56.676115260 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4251577Z 
2025-12-04T11:11:26.4252092Z [W1204 11:01:56.676742676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4252725Z 
2025-12-04T11:11:26.4253227Z [W1204 11:01:56.676931833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4253874Z 
2025-12-04T11:11:26.4254376Z [W1204 11:01:56.682872253 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4255022Z 
2025-12-04T11:11:26.4255521Z [W1204 11:01:56.683703487 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4256158Z 
2025-12-04T11:11:26.4256670Z [W1204 11:01:56.683892549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4257304Z 
2025-12-04T11:11:26.4257446Z ('RERUN', {'yellow': True}) [0.4738s] [100%]
2025-12-04T11:11:26.4258920Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:01:56.022280564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4260298Z 
2025-12-04T11:11:26.4260808Z [W1204 11:01:56.023015910 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4261460Z 
2025-12-04T11:11:26.4261963Z [W1204 11:01:56.023209421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4262598Z 
2025-12-04T11:11:26.4263115Z [W1204 11:01:56.027144697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4263747Z 
2025-12-04T11:11:26.4264259Z [W1204 11:01:56.027760459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4264888Z 
2025-12-04T11:11:26.4265390Z [W1204 11:01:56.027945369 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4266035Z 
2025-12-04T11:11:26.4266540Z [W1204 11:01:56.033998053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4267185Z 
2025-12-04T11:11:26.4267687Z [W1204 11:01:56.034643585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4268336Z 
2025-12-04T11:11:26.4268833Z [W1204 11:01:56.034824921 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4269469Z 
2025-12-04T11:11:26.4269982Z [W1204 11:01:56.123174276 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4270610Z 
2025-12-04T11:11:26.4271123Z [W1204 11:01:56.123961593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4271756Z 
2025-12-04T11:11:26.4272317Z [W1204 11:01:56.124165426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4272961Z 
2025-12-04T11:11:26.4273464Z [W1204 11:01:56.128127390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4274138Z 
2025-12-04T11:11:26.4274640Z [W1204 11:01:56.128802321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4275306Z 
2025-12-04T11:11:26.4275821Z [W1204 11:01:56.128994523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4276457Z 
2025-12-04T11:11:26.4276973Z [W1204 11:01:56.134997557 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4277602Z 
2025-12-04T11:11:26.4278111Z [W1204 11:01:56.135883643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4278754Z 
2025-12-04T11:11:26.4279257Z [W1204 11:01:56.136073684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4279908Z 
2025-12-04T11:11:26.4280008Z FAILED [0.4510s] [100%]
2025-12-04T11:11:26.4280181Z 
2025-12-04T11:11:26.4280337Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4281104Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4281917Z Traceback (most recent call last):
2025-12-04T11:11:26.4282657Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4283526Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4284336Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4285095Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4285926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4286787Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4287260Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4287525Z 
2025-12-04T11:11:26.4287633Z Expected 1 but got 2.
2025-12-04T11:11:26.4287916Z Absolute difference: 1
2025-12-04T11:11:26.4288196Z Relative difference: 1.0
2025-12-04T11:11:26.4288397Z 
2025-12-04T11:11:26.4288604Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4289832Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4290845Z 
2025-12-04T11:11:26.4291128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4291740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4292215Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4293307Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4294540Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4294986Z graph_break []
2025-12-04T11:11:26.4295355Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4296966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4298375Z   if out == self.unknown_value:
2025-12-04T11:11:26.4299303Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4300289Z   warnings.warn(
2025-12-04T11:11:26.4301345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4302348Z   warnings.warn(
2025-12-04T11:11:26.4303002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4303753Z Traceback (most recent call last):
2025-12-04T11:11:26.4304483Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4305337Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4306149Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4306900Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4307709Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4308576Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4309042Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4309288Z 
2025-12-04T11:11:26.4309404Z Expected 1 but got 2.
2025-12-04T11:11:26.4309674Z Absolute difference: 1
2025-12-04T11:11:26.4309964Z Relative difference: 1.0
2025-12-04T11:11:26.4310149Z 
2025-12-04T11:11:26.4310370Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4311585Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4312603Z 
2025-12-04T11:11:26.4312865Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4313486Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4313957Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4315026Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4316257Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4316717Z graph_break []
2025-12-04T11:11:26.4317087Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4318615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4320040Z   if out == self.unknown_value:
2025-12-04T11:11:26.4320977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4321992Z   warnings.warn(
2025-12-04T11:11:26.4322850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4323796Z   warnings.warn(
2025-12-04T11:11:26.4324170Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4324638Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4325061Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4326401Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4327558Z graph_break []
2025-12-04T11:11:26.4327908Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4328975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4329957Z   warnings.warn(
2025-12-04T11:11:26.4330825Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4331757Z   warnings.warn(
2025-12-04T11:11:26.4332062Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4332847Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4333590Z Traceback (most recent call last):
2025-12-04T11:11:26.4334314Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4335187Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4335991Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4336725Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4337544Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4338408Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4338871Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4339118Z 
2025-12-04T11:11:26.4339229Z Expected 1 but got 2.
2025-12-04T11:11:26.4339514Z Absolute difference: 1
2025-12-04T11:11:26.4339805Z Relative difference: 1.0
2025-12-04T11:11:26.4339987Z 
2025-12-04T11:11:26.4340197Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4341422Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4342436Z 
2025-12-04T11:11:26.4342696Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4343313Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4343766Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4344854Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4346077Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4346536Z graph_break []
2025-12-04T11:11:26.4346886Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4348420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4349854Z   if out == self.unknown_value:
2025-12-04T11:11:26.4350793Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4351733Z   warnings.warn(
2025-12-04T11:11:26.4352705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4353656Z   warnings.warn(
2025-12-04T11:11:26.4354035Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4354526Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4354967Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4355836Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4355982Z graph_break []
2025-12-04T11:11:26.4356198Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4356927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4357028Z   warnings.warn(
2025-12-04T11:11:26.4357735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4357848Z   warnings.warn(
2025-12-04T11:11:26.4358063Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4358177Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4358413Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4359285Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4359394Z graph_break []
2025-12-04T11:11:26.4359606Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4360321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4360432Z   warnings.warn(
2025-12-04T11:11:26.4361132Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4361245Z   warnings.warn(
2025-12-04T11:11:26.4362144Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml -
2025-12-04T11:11:26.4362319Z =========================== short test summary info ============================
2025-12-04T11:11:26.4363245Z FAILED [0.4510s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4363251Z 
2025-12-04T11:11:26.4363359Z Expected 1 but got 2.
2025-12-04T11:11:26.4363480Z Absolute difference: 1
2025-12-04T11:11:26.4363585Z Relative difference: 1.0
2025-12-04T11:11:26.4363590Z 
2025-12-04T11:11:26.4363802Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4364701Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4364706Z 
2025-12-04T11:11:26.4364968Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4365155Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4365348Z ================== 1 failed, 10 deselected, 2 rerun in 19.82s ==================
2025-12-04T11:11:26.4365443Z Got exit code 1
2025-12-04T11:11:26.4365561Z Retrying single test...
2025-12-04T11:11:26.4366066Z W1204 11:02:07.933000 90536 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4366717Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml
2025-12-04T11:11:26.4366927Z ============================= test session starts ==============================
2025-12-04T11:11:26.4367272Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4367389Z cachedir: .pytest_cache
2025-12-04T11:11:26.4367930Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4368052Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4368168Z configfile: pytest.ini
2025-12-04T11:11:26.4368700Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4368918Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4369886Z stepcurrent: skipping 4 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4370000Z Running 1 items in this shard
2025-12-04T11:11:26.4370005Z 
2025-12-04T11:11:26.4371251Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:13.903991753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4371259Z 
2025-12-04T11:11:26.4371767Z [W1204 11:02:28.615233474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4371772Z 
2025-12-04T11:11:26.4372292Z [W1204 11:02:28.615492186 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4372297Z 
2025-12-04T11:11:26.4372794Z [W1204 11:02:29.622862288 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4372802Z 
2025-12-04T11:11:26.4373314Z [W1204 11:02:29.623607436 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4373319Z 
2025-12-04T11:11:26.4373817Z [W1204 11:02:29.623802286 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4373821Z 
2025-12-04T11:11:26.4374320Z [W1204 11:02:29.630757437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4374337Z 
2025-12-04T11:11:26.4374839Z [W1204 11:02:29.631581072 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4374844Z 
2025-12-04T11:11:26.4375342Z [W1204 11:02:29.631769906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4375349Z 
2025-12-04T11:11:26.4375860Z [W1204 11:02:29.766370804 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4375864Z 
2025-12-04T11:11:26.4376361Z [W1204 11:02:29.768093826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4376368Z 
2025-12-04T11:11:26.4376878Z [W1204 11:02:29.768294765 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4376883Z 
2025-12-04T11:11:26.4377434Z [W1204 11:02:29.772229945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4377439Z 
2025-12-04T11:11:26.4377952Z [W1204 11:02:29.772886409 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4377989Z 
2025-12-04T11:11:26.4378485Z [W1204 11:02:29.773078770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4378491Z 
2025-12-04T11:11:26.4379004Z [W1204 11:02:29.779024015 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4379061Z 
2025-12-04T11:11:26.4379560Z [W1204 11:02:29.779657627 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4379564Z 
2025-12-04T11:11:26.4380061Z [W1204 11:02:29.779849749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4380083Z 
2025-12-04T11:11:26.4380212Z ('RERUN', {'yellow': True}) [19.4936s] [100%]
2025-12-04T11:11:26.4381451Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:29.179029099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4381459Z 
2025-12-04T11:11:26.4381971Z [W1204 11:02:29.179773396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4381978Z 
2025-12-04T11:11:26.4382472Z [W1204 11:02:29.179969761 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4382477Z 
2025-12-04T11:11:26.4382988Z [W1204 11:02:29.183911115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4382996Z 
2025-12-04T11:11:26.4383493Z [W1204 11:02:29.184526715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4383498Z 
2025-12-04T11:11:26.4384009Z [W1204 11:02:29.184712306 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4384014Z 
2025-12-04T11:11:26.4384508Z [W1204 11:02:29.190781903 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4384515Z 
2025-12-04T11:11:26.4385010Z [W1204 11:02:29.191394887 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4385027Z 
2025-12-04T11:11:26.4385523Z [W1204 11:02:29.191577756 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4385527Z 
2025-12-04T11:11:26.4386029Z [W1204 11:02:29.278343259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4386034Z 
2025-12-04T11:11:26.4386544Z [W1204 11:02:29.279120334 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4386551Z 
2025-12-04T11:11:26.4387050Z [W1204 11:02:29.279329305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4387057Z 
2025-12-04T11:11:26.4387564Z [W1204 11:02:29.283250241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4387568Z 
2025-12-04T11:11:26.4388074Z [W1204 11:02:29.283881956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4388079Z 
2025-12-04T11:11:26.4389365Z [W1204 11:02:29.284076030 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4389370Z 
2025-12-04T11:11:26.4389879Z [W1204 11:02:29.289969916 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4389914Z 
2025-12-04T11:11:26.4390434Z [W1204 11:02:29.290782534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4390488Z 
2025-12-04T11:11:26.4390992Z [W1204 11:02:29.290978451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4390997Z 
2025-12-04T11:11:26.4391124Z ('RERUN', {'yellow': True}) [0.4714s] [100%]
2025-12-04T11:11:26.4392386Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 [W1204 11:02:30.625002528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4392392Z 
2025-12-04T11:11:26.4392894Z [W1204 11:02:30.625710320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4392900Z 
2025-12-04T11:11:26.4393415Z [W1204 11:02:30.625903932 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4393422Z 
2025-12-04T11:11:26.4393922Z [W1204 11:02:30.629791944 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4393927Z 
2025-12-04T11:11:26.4394438Z [W1204 11:02:30.630461221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4394443Z 
2025-12-04T11:11:26.4394950Z [W1204 11:02:30.630651561 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4394954Z 
2025-12-04T11:11:26.4395470Z [W1204 11:02:30.636554565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4395477Z 
2025-12-04T11:11:26.4395976Z [W1204 11:02:30.637183459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4395983Z 
2025-12-04T11:11:26.4396495Z [W1204 11:02:30.637368777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4396500Z 
2025-12-04T11:11:26.4396998Z [W1204 11:02:30.723148759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4397003Z 
2025-12-04T11:11:26.4397504Z [W1204 11:02:30.723903833 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4397520Z 
2025-12-04T11:11:26.4398014Z [W1204 11:02:30.724105880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4398022Z 
2025-12-04T11:11:26.4398521Z [W1204 11:02:30.728055583 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4398525Z 
2025-12-04T11:11:26.4399039Z [W1204 11:02:30.728709729 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4399044Z 
2025-12-04T11:11:26.4399541Z [W1204 11:02:30.728909582 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4399545Z 
2025-12-04T11:11:26.4400108Z [W1204 11:02:30.734879426 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4400113Z 
2025-12-04T11:11:26.4400608Z [W1204 11:02:30.735685099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4400642Z 
2025-12-04T11:11:26.4401392Z [W1204 11:02:30.735874321 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4401397Z 
2025-12-04T11:11:26.4401556Z FAILED [0.4438s] [100%]
2025-12-04T11:11:26.4401632Z 
2025-12-04T11:11:26.4401780Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4402283Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4402402Z Traceback (most recent call last):
2025-12-04T11:11:26.4402920Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4403150Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4403602Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4403780Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4404305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4404524Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4404653Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4404658Z 
2025-12-04T11:11:26.4404761Z Expected 1 but got 2.
2025-12-04T11:11:26.4404884Z Absolute difference: 1
2025-12-04T11:11:26.4404991Z Relative difference: 1.0
2025-12-04T11:11:26.4404995Z 
2025-12-04T11:11:26.4405203Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4406096Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4406102Z 
2025-12-04T11:11:26.4406366Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4406593Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4406706Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4407577Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4407816Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4407911Z graph_break []
2025-12-04T11:11:26.4408134Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4409318Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4409433Z   if out == self.unknown_value:
2025-12-04T11:11:26.4410161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4410261Z   warnings.warn(
2025-12-04T11:11:26.4410979Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4411076Z   warnings.warn(
2025-12-04T11:11:26.4411567Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4411698Z Traceback (most recent call last):
2025-12-04T11:11:26.4412281Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4412511Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4413015Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4413175Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4413714Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4413951Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4414080Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4414085Z 
2025-12-04T11:11:26.4414202Z Expected 1 but got 2.
2025-12-04T11:11:26.4414308Z Absolute difference: 1
2025-12-04T11:11:26.4414414Z Relative difference: 1.0
2025-12-04T11:11:26.4414434Z 
2025-12-04T11:11:26.4414647Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4415525Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4415533Z 
2025-12-04T11:11:26.4415810Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4416023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4416139Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4417020Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4417241Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4417351Z graph_break []
2025-12-04T11:11:26.4417567Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4418745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4418874Z   if out == self.unknown_value:
2025-12-04T11:11:26.4419581Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4419694Z   warnings.warn(
2025-12-04T11:11:26.4420398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4420495Z   warnings.warn(
2025-12-04T11:11:26.4420723Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4420833Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4421068Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4421940Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4422039Z graph_break []
2025-12-04T11:11:26.4422261Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4422973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4423070Z   warnings.warn(
2025-12-04T11:11:26.4423785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4423945Z   warnings.warn(
2025-12-04T11:11:26.4424102Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4424593Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4424876Z Traceback (most recent call last):
2025-12-04T11:11:26.4425390Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4425649Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4426116Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4426279Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4426804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4427028Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4427157Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4427162Z 
2025-12-04T11:11:26.4427266Z Expected 1 but got 2.
2025-12-04T11:11:26.4427388Z Absolute difference: 1
2025-12-04T11:11:26.4427497Z Relative difference: 1.0
2025-12-04T11:11:26.4427502Z 
2025-12-04T11:11:26.4427728Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4428605Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4428613Z 
2025-12-04T11:11:26.4428875Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4429106Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4429221Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4430117Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4430341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4430438Z graph_break []
2025-12-04T11:11:26.4430667Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4431847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4431977Z   if out == self.unknown_value:
2025-12-04T11:11:26.4432689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4432792Z   warnings.warn(
2025-12-04T11:11:26.4433515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4433614Z   warnings.warn(
2025-12-04T11:11:26.4433827Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4433955Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4434182Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4435063Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4435159Z graph_break []
2025-12-04T11:11:26.4435371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4436148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4436246Z   warnings.warn(
2025-12-04T11:11:26.4436969Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4437113Z   warnings.warn(
2025-12-04T11:11:26.4437323Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4437476Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4437698Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4438570Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4438679Z graph_break []
2025-12-04T11:11:26.4438895Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4439616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4439717Z   warnings.warn(
2025-12-04T11:11:26.4440418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4440530Z   warnings.warn(
2025-12-04T11:11:26.4441356Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml -
2025-12-04T11:11:26.4441607Z =========================== short test summary info ============================
2025-12-04T11:11:26.4442521Z FAILED [0.4438s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4442527Z 
2025-12-04T11:11:26.4442632Z Expected 1 but got 2.
2025-12-04T11:11:26.4442751Z Absolute difference: 1
2025-12-04T11:11:26.4442859Z Relative difference: 1.0
2025-12-04T11:11:26.4442864Z 
2025-12-04T11:11:26.4443088Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4443966Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4443973Z 
2025-12-04T11:11:26.4444236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4444425Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4444616Z ================== 1 failed, 10 deselected, 2 rerun in 20.44s ==================
2025-12-04T11:11:26.4444730Z Got exit code 1
2025-12-04T11:11:26.4445524Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4445931Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.4446378Z W1204 11:02:41.168000 90710 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4447022Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml
2025-12-04T11:11:26.4447204Z ============================= test session starts ==============================
2025-12-04T11:11:26.4447547Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4447656Z cachedir: .pytest_cache
2025-12-04T11:11:26.4448253Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4448378Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4448484Z configfile: pytest.ini
2025-12-04T11:11:26.4449061Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4449271Z collecting ... collected 58 items / 5 deselected / 53 selected
2025-12-04T11:11:26.4449456Z stepcurrent: skipping 5 already run items.
2025-12-04T11:11:26.4449568Z Running 6 items in this shard
2025-12-04T11:11:26.4449573Z 
2025-12-04T11:11:26.4450417Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7628s] [ 16%]
2025-12-04T11:11:26.4451269Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4392s] [ 16%]
2025-12-04T11:11:26.4452018Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4369s] [ 16%]
2025-12-04T11:11:26.4452025Z 
2025-12-04T11:11:26.4452182Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4452674Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4452795Z Traceback (most recent call last):
2025-12-04T11:11:26.4453313Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4453537Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4454008Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4454167Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4454692Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4454909Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4455036Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4455041Z 
2025-12-04T11:11:26.4455157Z Expected 1 but got 2.
2025-12-04T11:11:26.4455264Z Absolute difference: 1
2025-12-04T11:11:26.4455371Z Relative difference: 1.0
2025-12-04T11:11:26.4455376Z 
2025-12-04T11:11:26.4455597Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4456478Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4456483Z 
2025-12-04T11:11:26.4456759Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4456975Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4457092Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4457971Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4458197Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4458292Z graph_break []
2025-12-04T11:11:26.4458520Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4459241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4459353Z   warnings.warn(
2025-12-04T11:11:26.4460115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4460216Z   warnings.warn(
2025-12-04T11:11:26.4460747Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4460865Z Traceback (most recent call last):
2025-12-04T11:11:26.4461362Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4461630Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4462076Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4462248Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4462772Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4462975Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4463120Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4463128Z 
2025-12-04T11:11:26.4463230Z Expected 1 but got 2.
2025-12-04T11:11:26.4463346Z Absolute difference: 1
2025-12-04T11:11:26.4463452Z Relative difference: 1.0
2025-12-04T11:11:26.4463457Z 
2025-12-04T11:11:26.4463666Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4464559Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4464563Z 
2025-12-04T11:11:26.4464827Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4465052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4465170Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4466036Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4466269Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4466368Z graph_break []
2025-12-04T11:11:26.4466579Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4467310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4467408Z   warnings.warn(
2025-12-04T11:11:26.4468130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4468230Z   warnings.warn(
2025-12-04T11:11:26.4468440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4468564Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4468785Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4469669Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4469765Z graph_break []
2025-12-04T11:11:26.4469974Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4470698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4470793Z   warnings.warn(
2025-12-04T11:11:26.4471559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4471672Z   warnings.warn(
2025-12-04T11:11:26.4471813Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4472347Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4472463Z Traceback (most recent call last):
2025-12-04T11:11:26.4472959Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4473233Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4473680Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4473854Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4474383Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4474582Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4474722Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4474729Z 
2025-12-04T11:11:26.4474834Z Expected 1 but got 2.
2025-12-04T11:11:26.4474938Z Absolute difference: 1
2025-12-04T11:11:26.4475056Z Relative difference: 1.0
2025-12-04T11:11:26.4475060Z 
2025-12-04T11:11:26.4475272Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4476166Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4476171Z 
2025-12-04T11:11:26.4476434Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4476649Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4476775Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4477642Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4477877Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4477973Z graph_break []
2025-12-04T11:11:26.4478184Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4478918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4479016Z   warnings.warn(
2025-12-04T11:11:26.4479738Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4479839Z   warnings.warn(
2025-12-04T11:11:26.4480052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4480176Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4480400Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4481264Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4481374Z graph_break []
2025-12-04T11:11:26.4481653Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4482379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4482475Z   warnings.warn(
2025-12-04T11:11:26.4483265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4483376Z   warnings.warn(
2025-12-04T11:11:26.4483585Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4483728Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4483961Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4484831Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4484970Z graph_break []
2025-12-04T11:11:26.4485177Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4485894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4486004Z   warnings.warn(
2025-12-04T11:11:26.4486709Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4486818Z   warnings.warn(
2025-12-04T11:11:26.4487642Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml -
2025-12-04T11:11:26.4487814Z =========================== short test summary info ============================
2025-12-04T11:11:26.4488747Z FAILED [0.4369s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4488752Z 
2025-12-04T11:11:26.4488856Z Expected 1 but got 2.
2025-12-04T11:11:26.4488982Z Absolute difference: 1
2025-12-04T11:11:26.4489090Z Relative difference: 1.0
2025-12-04T11:11:26.4489094Z 
2025-12-04T11:11:26.4489305Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4490198Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4490206Z 
2025-12-04T11:11:26.4490469Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4490664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4490856Z =================== 1 failed, 5 deselected, 2 rerun in 4.67s ===================
2025-12-04T11:11:26.4490954Z Got exit code 1
2025-12-04T11:11:26.4491073Z Retrying single test...
2025-12-04T11:11:26.4491516Z W1204 11:03:00.783000 90879 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4492161Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml
2025-12-04T11:11:26.4492339Z ============================= test session starts ==============================
2025-12-04T11:11:26.4492686Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4492810Z cachedir: .pytest_cache
2025-12-04T11:11:26.4493321Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4493447Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4493570Z configfile: pytest.ini
2025-12-04T11:11:26.4494103Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4494336Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4495364Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4495507Z Running 1 items in this shard
2025-12-04T11:11:26.4495512Z 
2025-12-04T11:11:26.4496776Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:06.748836219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4496815Z 
2025-12-04T11:11:26.4497326Z [W1204 11:03:21.404956419 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4497331Z 
2025-12-04T11:11:26.4497852Z [W1204 11:03:21.405208430 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4497858Z 
2025-12-04T11:11:26.4498359Z [W1204 11:03:21.412347114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4498366Z 
2025-12-04T11:11:26.4498878Z [W1204 11:03:21.413016398 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4498883Z 
2025-12-04T11:11:26.4499383Z [W1204 11:03:21.413202601 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4499390Z 
2025-12-04T11:11:26.4499901Z [W1204 11:03:21.419902114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4499906Z 
2025-12-04T11:11:26.4500407Z [W1204 11:03:21.420666810 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4500415Z 
2025-12-04T11:11:26.4501092Z [W1204 11:03:21.420852983 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4501113Z 
2025-12-04T11:11:26.4501620Z [W1204 11:03:21.552806132 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4501624Z 
2025-12-04T11:11:26.4502128Z [W1204 11:03:21.554519524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4502135Z 
2025-12-04T11:11:26.4502648Z [W1204 11:03:21.554735479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4502652Z 
2025-12-04T11:11:26.4503152Z [W1204 11:03:21.558557259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4503160Z 
2025-12-04T11:11:26.4503677Z [W1204 11:03:21.559185258 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4503682Z 
2025-12-04T11:11:26.4504184Z [W1204 11:03:21.559375200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4504188Z 
2025-12-04T11:11:26.4504701Z [W1204 11:03:21.565220358 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4504708Z 
2025-12-04T11:11:26.4505210Z [W1204 11:03:21.565828492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4505214Z 
2025-12-04T11:11:26.4505732Z [W1204 11:03:21.566015789 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4505737Z 
2025-12-04T11:11:26.4506024Z ('RERUN', {'yellow': True}) [19.4420s] [100%]
2025-12-04T11:11:26.4507270Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:22.959910785 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4507320Z 
2025-12-04T11:11:26.4507840Z [W1204 11:03:22.960696728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4507884Z 
2025-12-04T11:11:26.4508388Z [W1204 11:03:22.960902474 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4508393Z 
2025-12-04T11:11:26.4508913Z [W1204 11:03:22.964851671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4508918Z 
2025-12-04T11:11:26.4509425Z [W1204 11:03:22.965483441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4509429Z 
2025-12-04T11:11:26.4509939Z [W1204 11:03:22.965672615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4509946Z 
2025-12-04T11:11:26.4510439Z [W1204 11:03:22.971752252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4510448Z 
2025-12-04T11:11:26.4510957Z [W1204 11:03:22.972365182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4510962Z 
2025-12-04T11:11:26.4511458Z [W1204 11:03:22.972549078 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4511462Z 
2025-12-04T11:11:26.4511961Z [W1204 11:03:22.058594136 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4511979Z 
2025-12-04T11:11:26.4512475Z [W1204 11:03:22.059351734 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4512483Z 
2025-12-04T11:11:26.4512977Z [W1204 11:03:22.059556938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4512984Z 
2025-12-04T11:11:26.4513486Z [W1204 11:03:22.063438320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4513491Z 
2025-12-04T11:11:26.4513986Z [W1204 11:03:22.064075460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4513990Z 
2025-12-04T11:11:26.4514501Z [W1204 11:03:22.064269381 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4514506Z 
2025-12-04T11:11:26.4515005Z [W1204 11:03:22.070241698 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4515011Z 
2025-12-04T11:11:26.4515524Z [W1204 11:03:22.071051380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4515529Z 
2025-12-04T11:11:26.4516030Z [W1204 11:03:22.071243549 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4516035Z 
2025-12-04T11:11:26.4516177Z ('RERUN', {'yellow': True}) [0.4673s] [100%]
2025-12-04T11:11:26.4517464Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:22.402212629 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4517470Z 
2025-12-04T11:11:26.4517971Z [W1204 11:03:22.402914455 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4518020Z 
2025-12-04T11:11:26.4518519Z [W1204 11:03:22.403107773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4518551Z 
2025-12-04T11:11:26.4519050Z [W1204 11:03:22.406973279 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4519055Z 
2025-12-04T11:11:26.4519561Z [W1204 11:03:22.407565278 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4519565Z 
2025-12-04T11:11:26.4520068Z [W1204 11:03:22.407750171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4520072Z 
2025-12-04T11:11:26.4520587Z [W1204 11:03:22.413775335 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4520593Z 
2025-12-04T11:11:26.4521091Z [W1204 11:03:22.414435087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4521097Z 
2025-12-04T11:11:26.4521674Z [W1204 11:03:22.414622748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4521681Z 
2025-12-04T11:11:26.4522178Z [W1204 11:03:22.500771533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4522182Z 
2025-12-04T11:11:26.4522684Z [W1204 11:03:22.501486234 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4522702Z 
2025-12-04T11:11:26.4523196Z [W1204 11:03:22.501684645 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4523203Z 
2025-12-04T11:11:26.4523700Z [W1204 11:03:22.505450890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4523705Z 
2025-12-04T11:11:26.4524214Z [W1204 11:03:22.506052933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4524220Z 
2025-12-04T11:11:26.4524719Z [W1204 11:03:22.506257153 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4524724Z 
2025-12-04T11:11:26.4525237Z [W1204 11:03:22.512070114 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4525241Z 
2025-12-04T11:11:26.4525737Z [W1204 11:03:22.512829068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4525744Z 
2025-12-04T11:11:26.4526250Z [W1204 11:03:22.513019757 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4526254Z 
2025-12-04T11:11:26.4526353Z FAILED [0.4395s] [100%]
2025-12-04T11:11:26.4526358Z 
2025-12-04T11:11:26.4526502Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4527002Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4527122Z Traceback (most recent call last):
2025-12-04T11:11:26.4527631Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4527939Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4528396Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4528600Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4529123Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4529338Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4529501Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4529506Z 
2025-12-04T11:11:26.4529611Z Expected 1 but got 2.
2025-12-04T11:11:26.4529732Z Absolute difference: 1
2025-12-04T11:11:26.4529842Z Relative difference: 1.0
2025-12-04T11:11:26.4529847Z 
2025-12-04T11:11:26.4530056Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4530960Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4530965Z 
2025-12-04T11:11:26.4531228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4531463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4531578Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4532449Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4532684Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4532781Z graph_break []
2025-12-04T11:11:26.4533004Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4534190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4534309Z   if out == self.unknown_value:
2025-12-04T11:11:26.4535036Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4535134Z   warnings.warn(
2025-12-04T11:11:26.4535858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4535955Z   warnings.warn(
2025-12-04T11:11:26.4536447Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4536581Z Traceback (most recent call last):
2025-12-04T11:11:26.4537084Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4537321Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4537765Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4537928Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4538465Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4538665Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4538793Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4538798Z 
2025-12-04T11:11:26.4538913Z Expected 1 but got 2.
2025-12-04T11:11:26.4539017Z Absolute difference: 1
2025-12-04T11:11:26.4539135Z Relative difference: 1.0
2025-12-04T11:11:26.4539140Z 
2025-12-04T11:11:26.4539406Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4540289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4540323Z 
2025-12-04T11:11:26.4540597Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4540812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4540966Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4541832Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4542053Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4542161Z graph_break []
2025-12-04T11:11:26.4542379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4543578Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4543694Z   if out == self.unknown_value:
2025-12-04T11:11:26.4544406Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4544519Z   warnings.warn(
2025-12-04T11:11:26.4545221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4545316Z   warnings.warn(
2025-12-04T11:11:26.4545541Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4545657Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4545892Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4546756Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4546851Z graph_break []
2025-12-04T11:11:26.4547073Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4547782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4547890Z   warnings.warn(
2025-12-04T11:11:26.4548594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4548688Z   warnings.warn(
2025-12-04T11:11:26.4548846Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4549340Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4549459Z Traceback (most recent call last):
2025-12-04T11:11:26.4549967Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4550194Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4550657Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4550821Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4551344Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4551615Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4551744Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4551750Z 
2025-12-04T11:11:26.4551866Z Expected 1 but got 2.
2025-12-04T11:11:26.4551974Z Absolute difference: 1
2025-12-04T11:11:26.4552111Z Relative difference: 1.0
2025-12-04T11:11:26.4552116Z 
2025-12-04T11:11:26.4552342Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4553220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4553256Z 
2025-12-04T11:11:26.4553519Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4553747Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4553859Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4554752Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4554973Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4555072Z graph_break []
2025-12-04T11:11:26.4555297Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4556470Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4556599Z   if out == self.unknown_value:
2025-12-04T11:11:26.4557309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4557480Z   warnings.warn(
2025-12-04T11:11:26.4558295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4567894Z   warnings.warn(
2025-12-04T11:11:26.4568259Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4568401Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4568644Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4569564Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4569683Z graph_break []
2025-12-04T11:11:26.4569910Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4570678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4570786Z   warnings.warn(
2025-12-04T11:11:26.4571520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4571637Z   warnings.warn(
2025-12-04T11:11:26.4571859Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4571975Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4572225Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4573130Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4573245Z graph_break []
2025-12-04T11:11:26.4573608Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4574475Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4574630Z   warnings.warn(
2025-12-04T11:11:26.4575330Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4575426Z   warnings.warn(
2025-12-04T11:11:26.4576293Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml -
2025-12-04T11:11:26.4576475Z =========================== short test summary info ============================
2025-12-04T11:11:26.4577400Z FAILED [0.4395s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4577409Z 
2025-12-04T11:11:26.4577529Z Expected 1 but got 2.
2025-12-04T11:11:26.4577632Z Absolute difference: 1
2025-12-04T11:11:26.4577737Z Relative difference: 1.0
2025-12-04T11:11:26.4577759Z 
2025-12-04T11:11:26.4577975Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4578855Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4578863Z 
2025-12-04T11:11:26.4579140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4579319Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4579512Z ================== 1 failed, 10 deselected, 2 rerun in 20.38s ==================
2025-12-04T11:11:26.4579623Z Got exit code 1
2025-12-04T11:11:26.4579734Z Retrying single test...
2025-12-04T11:11:26.4580188Z W1204 11:03:33.912000 91053 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4580837Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml
2025-12-04T11:11:26.4581002Z ============================= test session starts ==============================
2025-12-04T11:11:26.4581360Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4581468Z cachedir: .pytest_cache
2025-12-04T11:11:26.4581994Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4582116Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4582221Z configfile: pytest.ini
2025-12-04T11:11:26.4582765Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4582982Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4583941Z stepcurrent: skipping 5 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4584069Z Running 1 items in this shard
2025-12-04T11:11:26.4584074Z 
2025-12-04T11:11:26.4585318Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:39.870758741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4585324Z 
2025-12-04T11:11:26.4585915Z [W1204 11:03:54.427769773 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4585921Z 
2025-12-04T11:11:26.4586426Z [W1204 11:03:54.428025975 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4586463Z 
2025-12-04T11:11:26.4586976Z [W1204 11:03:54.435202931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4586981Z 
2025-12-04T11:11:26.4587478Z [W1204 11:03:54.435908744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4587516Z 
2025-12-04T11:11:26.4588028Z [W1204 11:03:54.436097008 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4588032Z 
2025-12-04T11:11:26.4588529Z [W1204 11:03:54.442828957 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4588539Z 
2025-12-04T11:11:26.4589052Z [W1204 11:03:54.443571676 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4589057Z 
2025-12-04T11:11:26.4589556Z [W1204 11:03:54.443751494 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4589561Z 
2025-12-04T11:11:26.4590060Z [W1204 11:03:54.575144107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4590067Z 
2025-12-04T11:11:26.4590583Z [W1204 11:03:54.576850952 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4590588Z 
2025-12-04T11:11:26.4591090Z [W1204 11:03:54.577054720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4591094Z 
2025-12-04T11:11:26.4591611Z [W1204 11:03:54.580937635 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4591615Z 
2025-12-04T11:11:26.4592115Z [W1204 11:03:54.581568770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4592122Z 
2025-12-04T11:11:26.4592633Z [W1204 11:03:54.581761560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4592639Z 
2025-12-04T11:11:26.4593136Z [W1204 11:03:54.587707393 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4593140Z 
2025-12-04T11:11:26.4593651Z [W1204 11:03:54.588362514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4593656Z 
2025-12-04T11:11:26.4594155Z [W1204 11:03:54.588551033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4594160Z 
2025-12-04T11:11:26.4594289Z ('RERUN', {'yellow': True}) [19.3430s] [100%]
2025-12-04T11:11:26.4595541Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:55.978981816 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4595549Z 
2025-12-04T11:11:26.4596049Z [W1204 11:03:55.979701023 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4596053Z 
2025-12-04T11:11:26.4596562Z [W1204 11:03:55.979892191 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4596567Z 
2025-12-04T11:11:26.4597134Z [W1204 11:03:55.983766657 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4597139Z 
2025-12-04T11:11:26.4597654Z [W1204 11:03:55.984366204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4597687Z 
2025-12-04T11:11:26.4598182Z [W1204 11:03:55.984551403 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4598304Z 
2025-12-04T11:11:26.4598814Z [W1204 11:03:55.990481650 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4598819Z 
2025-12-04T11:11:26.4599314Z [W1204 11:03:55.991081437 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4599319Z 
2025-12-04T11:11:26.4599837Z [W1204 11:03:55.991265748 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4599841Z 
2025-12-04T11:11:26.4600339Z [W1204 11:03:55.076311372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4600345Z 
2025-12-04T11:11:26.4601062Z [W1204 11:03:55.077047215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4601081Z 
2025-12-04T11:11:26.4601640Z [W1204 11:03:55.077253938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4601646Z 
2025-12-04T11:11:26.4602143Z [W1204 11:03:55.081107706 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4602148Z 
2025-12-04T11:11:26.4602661Z [W1204 11:03:55.081727993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4602666Z 
2025-12-04T11:11:26.4603165Z [W1204 11:03:55.081918302 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4603172Z 
2025-12-04T11:11:26.4603682Z [W1204 11:03:55.087821826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4603687Z 
2025-12-04T11:11:26.4604189Z [W1204 11:03:55.088580956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4604193Z 
2025-12-04T11:11:26.4604704Z [W1204 11:03:55.088768053 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4604709Z 
2025-12-04T11:11:26.4604834Z ('RERUN', {'yellow': True}) [0.4607s] [100%]
2025-12-04T11:11:26.4606075Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:03:55.423006532 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4606095Z 
2025-12-04T11:11:26.4606591Z [W1204 11:03:55.423708090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4606595Z 
2025-12-04T11:11:26.4607094Z [W1204 11:03:55.423902837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4607099Z 
2025-12-04T11:11:26.4607604Z [W1204 11:03:55.427842342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4607608Z 
2025-12-04T11:11:26.4608249Z [W1204 11:03:55.428440610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4608255Z 
2025-12-04T11:11:26.4608767Z [W1204 11:03:55.428624855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4608835Z 
2025-12-04T11:11:26.4609334Z [W1204 11:03:55.434628162 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4609339Z 
2025-12-04T11:11:26.4609850Z [W1204 11:03:55.435223641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4609903Z 
2025-12-04T11:11:26.4610400Z [W1204 11:03:55.435406363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4610405Z 
2025-12-04T11:11:26.4610922Z [W1204 11:03:55.520618941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4610926Z 
2025-12-04T11:11:26.4611426Z [W1204 11:03:55.521370834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4611433Z 
2025-12-04T11:11:26.4611930Z [W1204 11:03:55.521570526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4611948Z 
2025-12-04T11:11:26.4612439Z [W1204 11:03:55.525424699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4612446Z 
2025-12-04T11:11:26.4612940Z [W1204 11:03:55.526031759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4612945Z 
2025-12-04T11:11:26.4613451Z [W1204 11:03:55.526233893 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4613459Z 
2025-12-04T11:11:26.4613952Z [W1204 11:03:55.532216985 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4613957Z 
2025-12-04T11:11:26.4614466Z [W1204 11:03:55.533061159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4614471Z 
2025-12-04T11:11:26.4614963Z [W1204 11:03:55.533255605 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4614970Z 
2025-12-04T11:11:26.4615082Z FAILED [0.4438s] [100%]
2025-12-04T11:11:26.4615087Z 
2025-12-04T11:11:26.4615230Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4615719Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4615855Z Traceback (most recent call last):
2025-12-04T11:11:26.4616356Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4616598Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4617059Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4617218Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4617757Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4617960Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4618086Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4618103Z 
2025-12-04T11:11:26.4618206Z Expected 1 but got 2.
2025-12-04T11:11:26.4618310Z Absolute difference: 1
2025-12-04T11:11:26.4618432Z Relative difference: 1.0
2025-12-04T11:11:26.4618437Z 
2025-12-04T11:11:26.4618705Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4619583Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4619617Z 
2025-12-04T11:11:26.4619894Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4620110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4620268Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4621140Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4621359Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4621463Z graph_break []
2025-12-04T11:11:26.4621677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4622867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4622983Z   if out == self.unknown_value:
2025-12-04T11:11:26.4623699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4623808Z   warnings.warn(
2025-12-04T11:11:26.4624513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4624621Z   warnings.warn(
2025-12-04T11:11:26.4625116Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4625236Z Traceback (most recent call last):
2025-12-04T11:11:26.4625747Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4625975Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4626424Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4626603Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4627127Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4627345Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4627473Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4627478Z 
2025-12-04T11:11:26.4627580Z Expected 1 but got 2.
2025-12-04T11:11:26.4627704Z Absolute difference: 1
2025-12-04T11:11:26.4627812Z Relative difference: 1.0
2025-12-04T11:11:26.4627816Z 
2025-12-04T11:11:26.4628028Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4628921Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4628927Z 
2025-12-04T11:11:26.4629192Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4629426Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4629542Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4630413Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4630706Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4630803Z graph_break []
2025-12-04T11:11:26.4631028Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4632237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4632380Z   if out == self.unknown_value:
2025-12-04T11:11:26.4633100Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4633199Z   warnings.warn(
2025-12-04T11:11:26.4633921Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4634023Z   warnings.warn(
2025-12-04T11:11:26.4634237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4634360Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4634586Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4635466Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4635563Z graph_break []
2025-12-04T11:11:26.4635773Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4636497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4636593Z   warnings.warn(
2025-12-04T11:11:26.4637297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4637404Z   warnings.warn(
2025-12-04T11:11:26.4637547Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4638054Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4638171Z Traceback (most recent call last):
2025-12-04T11:11:26.4638669Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4638908Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4639356Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4639531Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4640063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4640263Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4640408Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4640414Z 
2025-12-04T11:11:26.4640519Z Expected 1 but got 2.
2025-12-04T11:11:26.4640622Z Absolute difference: 1
2025-12-04T11:11:26.4640744Z Relative difference: 1.0
2025-12-04T11:11:26.4640748Z 
2025-12-04T11:11:26.4640962Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4642039Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4642047Z 
2025-12-04T11:11:26.4642312Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4642607Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4642735Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4643600Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4643876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4643972Z graph_break []
2025-12-04T11:11:26.4644216Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4645410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4645523Z   if out == self.unknown_value:
2025-12-04T11:11:26.4646249Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4646345Z   warnings.warn(
2025-12-04T11:11:26.4647052Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4647166Z   warnings.warn(
2025-12-04T11:11:26.4647379Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4647494Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4647730Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4648597Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4648705Z graph_break []
2025-12-04T11:11:26.4648919Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4649629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4649747Z   warnings.warn(
2025-12-04T11:11:26.4650449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4650563Z   warnings.warn(
2025-12-04T11:11:26.4650771Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4650883Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4651117Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4651988Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4652083Z graph_break []
2025-12-04T11:11:26.4652305Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4653013Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4653122Z   warnings.warn(
2025-12-04T11:11:26.4653822Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4653921Z   warnings.warn(
2025-12-04T11:11:26.4654758Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml -
2025-12-04T11:11:26.4654927Z =========================== short test summary info ============================
2025-12-04T11:11:26.4655938Z FAILED [0.4438s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4655972Z 
2025-12-04T11:11:26.4656077Z Expected 1 but got 2.
2025-12-04T11:11:26.4656180Z Absolute difference: 1
2025-12-04T11:11:26.4656299Z Relative difference: 1.0
2025-12-04T11:11:26.4656304Z 
2025-12-04T11:11:26.4656515Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4657529Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4657537Z 
2025-12-04T11:11:26.4657899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4658158Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4658416Z ================== 1 failed, 10 deselected, 2 rerun in 20.28s ==================
2025-12-04T11:11:26.4658514Z Got exit code 1
2025-12-04T11:11:26.4659415Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4659901Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.4660344Z W1204 11:04:06.954000 91227 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4660997Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml
2025-12-04T11:11:26.4661160Z ============================= test session starts ==============================
2025-12-04T11:11:26.4661521Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4661630Z cachedir: .pytest_cache
2025-12-04T11:11:26.4662143Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4662280Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4662385Z configfile: pytest.ini
2025-12-04T11:11:26.4662913Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4663139Z collecting ... collected 58 items / 6 deselected / 52 selected
2025-12-04T11:11:26.4663277Z stepcurrent: skipping 6 already run items.
2025-12-04T11:11:26.4663401Z Running 5 items in this shard
2025-12-04T11:11:26.4663406Z 
2025-12-04T11:11:26.4664268Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.1938s] [ 20%]
2025-12-04T11:11:26.4665121Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.8687s] [ 20%]
2025-12-04T11:11:26.4665900Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 FAILED [0.8693s] [ 20%]
2025-12-04T11:11:26.4665906Z 
2025-12-04T11:11:26.4666049Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4666569Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4666689Z Traceback (most recent call last):
2025-12-04T11:11:26.4667191Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4667536Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4667996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4668174Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4668735Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4668939Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4669122Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4669127Z 
2025-12-04T11:11:26.4669234Z Expected 1 but got 2.
2025-12-04T11:11:26.4669338Z Absolute difference: 1
2025-12-04T11:11:26.4669457Z Relative difference: 1.0
2025-12-04T11:11:26.4669462Z 
2025-12-04T11:11:26.4669670Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4670575Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4670581Z 
2025-12-04T11:11:26.4670842Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4671060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4671186Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4671704Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4671941Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4672036Z graph_break []
2025-12-04T11:11:26.4672247Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4672983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4673083Z   warnings.warn(
2025-12-04T11:11:26.4673804Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4673904Z   warnings.warn(
2025-12-04T11:11:26.4674409Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4674541Z Traceback (most recent call last):
2025-12-04T11:11:26.4675046Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4675273Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4675739Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4675897Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4676435Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4676639Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4676769Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4676774Z 
2025-12-04T11:11:26.4676886Z Expected 1 but got 2.
2025-12-04T11:11:26.4676990Z Absolute difference: 1
2025-12-04T11:11:26.4677095Z Relative difference: 1.0
2025-12-04T11:11:26.4677111Z 
2025-12-04T11:11:26.4677324Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4678215Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4678221Z 
2025-12-04T11:11:26.4678494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4678788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4678904Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4679434Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4679689Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4679797Z graph_break []
2025-12-04T11:11:26.4680013Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4680765Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4680878Z   warnings.warn(
2025-12-04T11:11:26.4681706Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4681828Z   warnings.warn(
2025-12-04T11:11:26.4682042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4682157Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4682400Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4682914Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4683009Z graph_break []
2025-12-04T11:11:26.4683241Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4683949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4684066Z   warnings.warn(
2025-12-04T11:11:26.4684772Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4684870Z   warnings.warn(
2025-12-04T11:11:26.4685024Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4685531Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4685650Z Traceback (most recent call last):
2025-12-04T11:11:26.4686162Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4686391Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4686849Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4687009Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4687538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4687750Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4687878Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4687883Z 
2025-12-04T11:11:26.4687999Z Expected 1 but got 2.
2025-12-04T11:11:26.4688104Z Absolute difference: 1
2025-12-04T11:11:26.4688210Z Relative difference: 1.0
2025-12-04T11:11:26.4688215Z 
2025-12-04T11:11:26.4688436Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4689341Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4689346Z 
2025-12-04T11:11:26.4689622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4689836Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4690026Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4690560Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4690814Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4690908Z graph_break []
2025-12-04T11:11:26.4691129Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4691844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4691990Z   warnings.warn(
2025-12-04T11:11:26.4692699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4692796Z   warnings.warn(
2025-12-04T11:11:26.4693023Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4693136Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4693358Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4693892Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4693992Z graph_break []
2025-12-04T11:11:26.4694217Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4694923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4695019Z   warnings.warn(
2025-12-04T11:11:26.4695732Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4695829Z   warnings.warn(
2025-12-04T11:11:26.4696054Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4696166Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4696389Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4696928Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4697022Z graph_break []
2025-12-04T11:11:26.4697234Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4697953Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4698048Z   warnings.warn(
2025-12-04T11:11:26.4698766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4698862Z   warnings.warn(
2025-12-04T11:11:26.4699684Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml -
2025-12-04T11:11:26.4699866Z =========================== short test summary info ============================
2025-12-04T11:11:26.4700787Z FAILED [0.8693s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4700795Z 
2025-12-04T11:11:26.4701093Z Expected 1 but got 2.
2025-12-04T11:11:26.4701209Z Absolute difference: 1
2025-12-04T11:11:26.4701331Z Relative difference: 1.0
2025-12-04T11:11:26.4701336Z 
2025-12-04T11:11:26.4701552Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4702600Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4702606Z 
2025-12-04T11:11:26.4702890Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4703117Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4703327Z =================== 1 failed, 6 deselected, 2 rerun in 5.96s ===================
2025-12-04T11:11:26.4703467Z Got exit code 1
2025-12-04T11:11:26.4703574Z Retrying single test...
2025-12-04T11:11:26.4704031Z W1204 11:04:27.382000 91404 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4704677Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml
2025-12-04T11:11:26.4704841Z ============================= test session starts ==============================
2025-12-04T11:11:26.4705200Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4705309Z cachedir: .pytest_cache
2025-12-04T11:11:26.4705841Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4705966Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4706073Z configfile: pytest.ini
2025-12-04T11:11:26.4706616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4706834Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4707812Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4707945Z Running 1 items in this shard
2025-12-04T11:11:26.4707952Z 
2025-12-04T11:11:26.4709216Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:31.000323376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4709224Z 
2025-12-04T11:11:26.4709750Z [W1204 11:04:46.196910160 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4709757Z 
2025-12-04T11:11:26.4710262Z [W1204 11:04:46.197171271 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4710267Z 
2025-12-04T11:11:26.4710785Z [W1204 11:04:46.204393783 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4710790Z 
2025-12-04T11:11:26.4711294Z [W1204 11:04:46.205117380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4711298Z 
2025-12-04T11:11:26.4711811Z [W1204 11:04:46.205303649 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4711818Z 
2025-12-04T11:11:26.4712312Z [W1204 11:04:46.212085762 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4712319Z 
2025-12-04T11:11:26.4712829Z [W1204 11:04:46.212726105 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4712834Z 
2025-12-04T11:11:26.4713326Z [W1204 11:04:46.212908166 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4713331Z 
2025-12-04T11:11:26.4713898Z [W1204 11:04:48.157651997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4713915Z 
2025-12-04T11:11:26.4714414Z [W1204 11:04:48.159427203 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4714449Z 
2025-12-04T11:11:26.4714944Z [W1204 11:04:48.159642979 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4714981Z 
2025-12-04T11:11:26.4715495Z [W1204 11:04:48.163627380 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4715499Z 
2025-12-04T11:11:26.4715996Z [W1204 11:04:48.164285044 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4716001Z 
2025-12-04T11:11:26.4716512Z [W1204 11:04:48.164482973 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4716516Z 
2025-12-04T11:11:26.4717013Z [W1204 11:04:48.170476830 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4717020Z 
2025-12-04T11:11:26.4717532Z [W1204 11:04:48.171132422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4717539Z 
2025-12-04T11:11:26.4718035Z [W1204 11:04:48.171324593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4718040Z 
2025-12-04T11:11:26.4718184Z ('RERUN', {'yellow': True}) [19.4180s] [100%]
2025-12-04T11:11:26.4719441Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:49.988740256 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4719447Z 
2025-12-04T11:11:26.4719947Z [W1204 11:04:49.989502499 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4719966Z 
2025-12-04T11:11:26.4720460Z [W1204 11:04:49.989698741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4720467Z 
2025-12-04T11:11:26.4720967Z [W1204 11:04:49.993612644 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4720971Z 
2025-12-04T11:11:26.4721541Z [W1204 11:04:49.994441634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4721549Z 
2025-12-04T11:11:26.4722051Z [W1204 11:04:49.994631138 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4722055Z 
2025-12-04T11:11:26.4722565Z [W1204 11:04:49.000614473 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4722572Z 
2025-12-04T11:11:26.4723069Z [W1204 11:04:49.001266611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4723074Z 
2025-12-04T11:11:26.4723589Z [W1204 11:04:49.001451465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4723593Z 
2025-12-04T11:11:26.4724087Z [W1204 11:04:49.088418889 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4724091Z 
2025-12-04T11:11:26.4724666Z [W1204 11:04:49.089219896 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4724685Z 
2025-12-04T11:11:26.4725183Z [W1204 11:04:49.089426241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4725217Z 
2025-12-04T11:11:26.4725720Z [W1204 11:04:49.093407415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4725726Z 
2025-12-04T11:11:26.4726238Z [W1204 11:04:49.094069794 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4726272Z 
2025-12-04T11:11:26.4726771Z [W1204 11:04:49.094277051 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4726775Z 
2025-12-04T11:11:26.4727289Z [W1204 11:04:49.100235056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4727294Z 
2025-12-04T11:11:26.4727787Z [W1204 11:04:49.101077542 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4727795Z 
2025-12-04T11:11:26.4728306Z [W1204 11:04:49.101267821 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4728311Z 
2025-12-04T11:11:26.4728442Z ('RERUN', {'yellow': True}) [0.8908s] [100%]
2025-12-04T11:11:26.4729715Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:04:50.864116671 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4729720Z 
2025-12-04T11:11:26.4730223Z [W1204 11:04:50.864883787 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4730228Z 
2025-12-04T11:11:26.4730723Z [W1204 11:04:50.865080111 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4730743Z 
2025-12-04T11:11:26.4731237Z [W1204 11:04:50.868993425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4731242Z 
2025-12-04T11:11:26.4731740Z [W1204 11:04:50.869684322 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4731746Z 
2025-12-04T11:11:26.4732258Z [W1204 11:04:50.869877254 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4732263Z 
2025-12-04T11:11:26.4732769Z [W1204 11:04:50.875946421 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4732773Z 
2025-12-04T11:11:26.4733285Z [W1204 11:04:50.876615441 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4733291Z 
2025-12-04T11:11:26.4733790Z [W1204 11:04:50.876799963 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4733795Z 
2025-12-04T11:11:26.4734307Z [W1204 11:04:50.962488634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4734314Z 
2025-12-04T11:11:26.4734812Z [W1204 11:04:50.963248245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4734816Z 
2025-12-04T11:11:26.4735327Z [W1204 11:04:50.963448590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4735394Z 
2025-12-04T11:11:26.4735898Z [W1204 11:04:50.967268727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4735902Z 
2025-12-04T11:11:26.4736436Z [W1204 11:04:50.967886517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4736440Z 
2025-12-04T11:11:26.4736955Z [W1204 11:04:50.968078942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4737001Z 
2025-12-04T11:11:26.4737504Z [W1204 11:04:50.973942722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4737509Z 
2025-12-04T11:11:26.4738019Z [W1204 11:04:50.974741331 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4738024Z 
2025-12-04T11:11:26.4738528Z [W1204 11:04:50.974931402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4738533Z 
2025-12-04T11:11:26.4738645Z FAILED [0.8705s] [100%]
2025-12-04T11:11:26.4738652Z 
2025-12-04T11:11:26.4738793Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4739298Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4739433Z Traceback (most recent call last):
2025-12-04T11:11:26.4739938Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4740179Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4740638Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4740803Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4741342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4741544Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4741685Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4741690Z 
2025-12-04T11:11:26.4741792Z Expected 1 but got 2.
2025-12-04T11:11:26.4741897Z Absolute difference: 1
2025-12-04T11:11:26.4742017Z Relative difference: 1.0
2025-12-04T11:11:26.4742025Z 
2025-12-04T11:11:26.4742236Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4743131Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4743149Z 
2025-12-04T11:11:26.4743414Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4743634Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4743763Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4744287Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4744514Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4744622Z graph_break []
2025-12-04T11:11:26.4744834Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4746037Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4746149Z   if out == self.unknown_value:
2025-12-04T11:11:26.4746927Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4747043Z   warnings.warn(
2025-12-04T11:11:26.4747747Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4747890Z   warnings.warn(
2025-12-04T11:11:26.4748394Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4748542Z Traceback (most recent call last):
2025-12-04T11:11:26.4749052Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4749280Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4749734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4749905Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4750428Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4750641Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4750767Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4750772Z 
2025-12-04T11:11:26.4750877Z Expected 1 but got 2.
2025-12-04T11:11:26.4750996Z Absolute difference: 1
2025-12-04T11:11:26.4751106Z Relative difference: 1.0
2025-12-04T11:11:26.4751111Z 
2025-12-04T11:11:26.4751319Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4752228Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4752233Z 
2025-12-04T11:11:26.4752498Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4752726Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4752839Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4753355Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4753592Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4753691Z graph_break []
2025-12-04T11:11:26.4753916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4755092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4755206Z   if out == self.unknown_value:
2025-12-04T11:11:26.4755931Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4756029Z   warnings.warn(
2025-12-04T11:11:26.4756749Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4756846Z   warnings.warn(
2025-12-04T11:11:26.4757055Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4757182Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4757405Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4757918Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4758023Z graph_break []
2025-12-04T11:11:26.4758295Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4759014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4759141Z   warnings.warn(
2025-12-04T11:11:26.4759845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4759983Z   warnings.warn(
2025-12-04T11:11:26.4760125Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4760641Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4760760Z Traceback (most recent call last):
2025-12-04T11:11:26.4761261Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4761561Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4762011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4762176Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4762715Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4762916Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4763062Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4763067Z 
2025-12-04T11:11:26.4763167Z Expected 1 but got 2.
2025-12-04T11:11:26.4763272Z Absolute difference: 1
2025-12-04T11:11:26.4763392Z Relative difference: 1.0
2025-12-04T11:11:26.4763396Z 
2025-12-04T11:11:26.4763606Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4764523Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4764528Z 
2025-12-04T11:11:26.4764851Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4765066Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4765192Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4765709Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4765931Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4766040Z graph_break []
2025-12-04T11:11:26.4766252Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4767453Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4767567Z   if out == self.unknown_value:
2025-12-04T11:11:26.4768275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4768386Z   warnings.warn(
2025-12-04T11:11:26.4769088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4769199Z   warnings.warn(
2025-12-04T11:11:26.4769412Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4769522Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4769759Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4770340Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4770437Z graph_break []
2025-12-04T11:11:26.4770659Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4771396Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4771541Z   warnings.warn(
2025-12-04T11:11:26.4772246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4772342Z   warnings.warn(
2025-12-04T11:11:26.4772566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4772679Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4772918Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4773432Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4773530Z graph_break []
2025-12-04T11:11:26.4773759Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4774464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4774565Z   warnings.warn(
2025-12-04T11:11:26.4775284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4775380Z   warnings.warn(
2025-12-04T11:11:26.4776215Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml -
2025-12-04T11:11:26.4776387Z =========================== short test summary info ============================
2025-12-04T11:11:26.4777313Z FAILED [0.8705s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4777335Z 
2025-12-04T11:11:26.4777438Z Expected 1 but got 2.
2025-12-04T11:11:26.4777543Z Absolute difference: 1
2025-12-04T11:11:26.4777668Z Relative difference: 1.0
2025-12-04T11:11:26.4777673Z 
2025-12-04T11:11:26.4777889Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4778783Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4778789Z 
2025-12-04T11:11:26.4779073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4779252Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4779465Z ================== 1 failed, 10 deselected, 2 rerun in 21.21s ==================
2025-12-04T11:11:26.4779619Z Got exit code 1
2025-12-04T11:11:26.4779725Z Retrying single test...
2025-12-04T11:11:26.4780180Z W1204 11:05:01.816000 91586 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4780832Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml
2025-12-04T11:11:26.4781011Z ============================= test session starts ==============================
2025-12-04T11:11:26.4781353Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4781461Z cachedir: .pytest_cache
2025-12-04T11:11:26.4782061Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4782187Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4782291Z configfile: pytest.ini
2025-12-04T11:11:26.4782966Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4783183Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4784205Z stepcurrent: skipping 6 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4784319Z Running 1 items in this shard
2025-12-04T11:11:26.4784324Z 
2025-12-04T11:11:26.4785587Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:05.431409412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4785607Z 
2025-12-04T11:11:26.4786117Z [W1204 11:05:20.612989526 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4786124Z 
2025-12-04T11:11:26.4786627Z [W1204 11:05:20.613247368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4786635Z 
2025-12-04T11:11:26.4787146Z [W1204 11:05:21.620638634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4787151Z 
2025-12-04T11:11:26.4787653Z [W1204 11:05:21.621390514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4787658Z 
2025-12-04T11:11:26.4788172Z [W1204 11:05:21.621581028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4788177Z 
2025-12-04T11:11:26.4788673Z [W1204 11:05:21.628482846 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4788680Z 
2025-12-04T11:11:26.4789193Z [W1204 11:05:21.629152339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4789200Z 
2025-12-04T11:11:26.4789695Z [W1204 11:05:21.629334412 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4789700Z 
2025-12-04T11:11:26.4790205Z [W1204 11:05:22.578988189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4790210Z 
2025-12-04T11:11:26.4790714Z [W1204 11:05:22.580795907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4790719Z 
2025-12-04T11:11:26.4791216Z [W1204 11:05:22.581014613 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4791238Z 
2025-12-04T11:11:26.4791732Z [W1204 11:05:22.585065347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4791738Z 
2025-12-04T11:11:26.4792233Z [W1204 11:05:22.585761585 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4792238Z 
2025-12-04T11:11:26.4792743Z [W1204 11:05:22.585961679 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4792748Z 
2025-12-04T11:11:26.4793316Z [W1204 11:05:22.592274069 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4793322Z 
2025-12-04T11:11:26.4793832Z [W1204 11:05:22.593010626 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4793867Z 
2025-12-04T11:11:26.4794365Z [W1204 11:05:22.593211046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4794370Z 
2025-12-04T11:11:26.4794539Z ('RERUN', {'yellow': True}) [19.4085s] [100%]
2025-12-04T11:11:26.4795791Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:23.403142050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4795797Z 
2025-12-04T11:11:26.4796298Z [W1204 11:05:23.403907558 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4796315Z 
2025-12-04T11:11:26.4796814Z [W1204 11:05:23.404110041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4796821Z 
2025-12-04T11:11:26.4797319Z [W1204 11:05:23.408049590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4797323Z 
2025-12-04T11:11:26.4797835Z [W1204 11:05:23.408835615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4797839Z 
2025-12-04T11:11:26.4798339Z [W1204 11:05:23.409024086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4798344Z 
2025-12-04T11:11:26.4798858Z [W1204 11:05:23.415015354 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4798863Z 
2025-12-04T11:11:26.4799358Z [W1204 11:05:23.415645931 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4799365Z 
2025-12-04T11:11:26.4799872Z [W1204 11:05:23.415831552 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4799877Z 
2025-12-04T11:11:26.4800374Z [W1204 11:05:23.500886908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4800382Z 
2025-12-04T11:11:26.4801115Z [W1204 11:05:23.501603822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4801121Z 
2025-12-04T11:11:26.4801672Z [W1204 11:05:23.501803129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4801678Z 
2025-12-04T11:11:26.4802176Z [W1204 11:05:23.505621832 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4802198Z 
2025-12-04T11:11:26.4802693Z [W1204 11:05:23.506242305 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4802698Z 
2025-12-04T11:11:26.4803192Z [W1204 11:05:23.506432061 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4803199Z 
2025-12-04T11:11:26.4803710Z [W1204 11:05:23.512355703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4803714Z 
2025-12-04T11:11:26.4804213Z [W1204 11:05:23.513130560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4804348Z 
2025-12-04T11:11:26.4804860Z [W1204 11:05:23.513319968 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4804864Z 
2025-12-04T11:11:26.4805036Z ('RERUN', {'yellow': True}) [0.8798s] [100%]
2025-12-04T11:11:26.4806303Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 [W1204 11:05:24.267476347 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4806351Z 
2025-12-04T11:11:26.4806850Z [W1204 11:05:24.268263720 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4806855Z 
2025-12-04T11:11:26.4807367Z [W1204 11:05:24.268464759 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4807372Z 
2025-12-04T11:11:26.4807871Z [W1204 11:05:24.272494294 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4807877Z 
2025-12-04T11:11:26.4808377Z [W1204 11:05:24.273176563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4808394Z 
2025-12-04T11:11:26.4808893Z [W1204 11:05:24.273372890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4808900Z 
2025-12-04T11:11:26.4809397Z [W1204 11:05:24.279395955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4809402Z 
2025-12-04T11:11:26.4809910Z [W1204 11:05:24.280081463 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4809918Z 
2025-12-04T11:11:26.4810416Z [W1204 11:05:24.280275795 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4810420Z 
2025-12-04T11:11:26.4810935Z [W1204 11:05:24.370255688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4810940Z 
2025-12-04T11:11:26.4811435Z [W1204 11:05:24.371073370 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4811442Z 
2025-12-04T11:11:26.4811955Z [W1204 11:05:24.371287641 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4811960Z 
2025-12-04T11:11:26.4812455Z [W1204 11:05:24.375288327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4812459Z 
2025-12-04T11:11:26.4812960Z [W1204 11:05:24.375971980 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4812978Z 
2025-12-04T11:11:26.4813475Z [W1204 11:05:24.376176547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4813482Z 
2025-12-04T11:11:26.4813979Z [W1204 11:05:24.382269564 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4813985Z 
2025-12-04T11:11:26.4814496Z [W1204 11:05:24.383153170 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4814501Z 
2025-12-04T11:11:26.4814998Z [W1204 11:05:24.383353443 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4815003Z 
2025-12-04T11:11:26.4815115Z FAILED [0.8703s] [100%]
2025-12-04T11:11:26.4815184Z 
2025-12-04T11:11:26.4815329Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4815847Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4815999Z Traceback (most recent call last):
2025-12-04T11:11:26.4816503Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4816775Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4817227Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4817389Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4817927Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4818133Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4818277Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4818282Z 
2025-12-04T11:11:26.4818383Z Expected 1 but got 2.
2025-12-04T11:11:26.4818490Z Absolute difference: 1
2025-12-04T11:11:26.4818612Z Relative difference: 1.0
2025-12-04T11:11:26.4818617Z 
2025-12-04T11:11:26.4818828Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4819722Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4819741Z 
2025-12-04T11:11:26.4820002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4820216Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4820342Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4820861Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4821080Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4821191Z graph_break []
2025-12-04T11:11:26.4821400Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4822594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4822710Z   if out == self.unknown_value:
2025-12-04T11:11:26.4823419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4823529Z   warnings.warn(
2025-12-04T11:11:26.4824237Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4824346Z   warnings.warn(
2025-12-04T11:11:26.4824851Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4824971Z Traceback (most recent call last):
2025-12-04T11:11:26.4825480Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4825708Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4826168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4826326Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4826922Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4827139Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4827267Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4827272Z 
2025-12-04T11:11:26.4827404Z Expected 1 but got 2.
2025-12-04T11:11:26.4827520Z Absolute difference: 1
2025-12-04T11:11:26.4827626Z Relative difference: 1.0
2025-12-04T11:11:26.4827631Z 
2025-12-04T11:11:26.4827854Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4828746Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4828780Z 
2025-12-04T11:11:26.4829042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4829270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4829385Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4829913Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4830135Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4830233Z graph_break []
2025-12-04T11:11:26.4830458Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4831633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4831748Z   if out == self.unknown_value:
2025-12-04T11:11:26.4832472Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4832574Z   warnings.warn(
2025-12-04T11:11:26.4833296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4833396Z   warnings.warn(
2025-12-04T11:11:26.4833608Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4833732Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4833954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4834484Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4834577Z graph_break []
2025-12-04T11:11:26.4834785Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4835699Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4835801Z   warnings.warn(
2025-12-04T11:11:26.4836510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4836626Z   warnings.warn(
2025-12-04T11:11:26.4836767Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4837286Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.4837407Z Traceback (most recent call last):
2025-12-04T11:11:26.4837907Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4838149Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4838667Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4838846Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4839367Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4839599Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4839739Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4839744Z 
2025-12-04T11:11:26.4839846Z Expected 1 but got 2.
2025-12-04T11:11:26.4839981Z Absolute difference: 1
2025-12-04T11:11:26.4840100Z Relative difference: 1.0
2025-12-04T11:11:26.4840105Z 
2025-12-04T11:11:26.4840314Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4841220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4841226Z 
2025-12-04T11:11:26.4841570Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4841788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4841917Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4842440Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4842675Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4842772Z graph_break []
2025-12-04T11:11:26.4842988Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4844189Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4844310Z   if out == self.unknown_value:
2025-12-04T11:11:26.4845050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4845156Z   warnings.warn(
2025-12-04T11:11:26.4845868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4845984Z   warnings.warn(
2025-12-04T11:11:26.4846199Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4846315Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4846557Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4847078Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4847188Z graph_break []
2025-12-04T11:11:26.4847404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4848118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4848234Z   warnings.warn(
2025-12-04T11:11:26.4848942Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4849054Z   warnings.warn(
2025-12-04T11:11:26.4849270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4849382Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4849622Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4850144Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.4850313Z graph_break []
2025-12-04T11:11:26.4850539Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4851251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4851394Z   warnings.warn(
2025-12-04T11:11:26.4852103Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4852232Z   warnings.warn(
2025-12-04T11:11:26.4853068Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml -
2025-12-04T11:11:26.4853237Z =========================== short test summary info ============================
2025-12-04T11:11:26.4854188Z FAILED [0.8703s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4854194Z 
2025-12-04T11:11:26.4854298Z Expected 1 but got 2.
2025-12-04T11:11:26.4854406Z Absolute difference: 1
2025-12-04T11:11:26.4854526Z Relative difference: 1.0
2025-12-04T11:11:26.4854531Z 
2025-12-04T11:11:26.4854744Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4855632Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4855652Z 
2025-12-04T11:11:26.4855911Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4856086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4856299Z ================== 1 failed, 10 deselected, 2 rerun in 21.19s ==================
2025-12-04T11:11:26.4856394Z Got exit code 1
2025-12-04T11:11:26.4857203Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.4857622Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.4858054Z W1204 11:05:35.814000 91768 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4858713Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml
2025-12-04T11:11:26.4858873Z ============================= test session starts ==============================
2025-12-04T11:11:26.4859215Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4859339Z cachedir: .pytest_cache
2025-12-04T11:11:26.4859850Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4859983Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4860089Z configfile: pytest.ini
2025-12-04T11:11:26.4860616Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4860843Z collecting ... collected 58 items / 7 deselected / 51 selected
2025-12-04T11:11:26.4860981Z stepcurrent: skipping 7 already run items.
2025-12-04T11:11:26.4861091Z Running 4 items in this shard
2025-12-04T11:11:26.4861109Z 
2025-12-04T11:11:26.4862359Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 W1204 11:05:41.310000 91768 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4862490Z ('RERUN', {'yellow': True}) [3.8797s] [ 25%]
2025-12-04T11:11:26.4863344Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.5074s] [ 25%]
2025-12-04T11:11:26.4864140Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 FAILED [0.5146s] [ 25%]
2025-12-04T11:11:26.4864177Z 
2025-12-04T11:11:26.4864328Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4864825Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4864944Z Traceback (most recent call last):
2025-12-04T11:11:26.4865459Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4865685Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4866155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4866320Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4866843Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4867063Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4867193Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4867198Z 
2025-12-04T11:11:26.4867302Z Expected 1 but got 0.
2025-12-04T11:11:26.4867417Z Absolute difference: 1
2025-12-04T11:11:26.4867523Z Relative difference: 1.0
2025-12-04T11:11:26.4867528Z 
2025-12-04T11:11:26.4867747Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4868634Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4868640Z 
2025-12-04T11:11:26.4868904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4869129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4869241Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4869937Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4870161Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4870256Z graph_break []
2025-12-04T11:11:26.4870384Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4870601Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4871320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4871433Z   warnings.warn(
2025-12-04T11:11:26.4872141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4872251Z   warnings.warn(
2025-12-04T11:11:26.4872746Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4872866Z Traceback (most recent call last):
2025-12-04T11:11:26.4873376Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4873603Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4874150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4874312Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4874836Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4875077Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4875207Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4875212Z 
2025-12-04T11:11:26.4875345Z Expected 1 but got 0.
2025-12-04T11:11:26.4875463Z Absolute difference: 1
2025-12-04T11:11:26.4875570Z Relative difference: 1.0
2025-12-04T11:11:26.4875575Z 
2025-12-04T11:11:26.4875800Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4876693Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4876702Z 
2025-12-04T11:11:26.4876963Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4877194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4877310Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4878006Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4878228Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4878323Z graph_break []
2025-12-04T11:11:26.4878454Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4878664Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4879385Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4879495Z   warnings.warn(
2025-12-04T11:11:26.4880206Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4880320Z   warnings.warn(
2025-12-04T11:11:26.4880531Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4880642Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4880876Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4881618Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4881733Z graph_break []
2025-12-04T11:11:26.4881852Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4882063Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4882792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4882890Z   warnings.warn(
2025-12-04T11:11:26.4883593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4883705Z   warnings.warn(
2025-12-04T11:11:26.4883848Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4884367Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4884485Z Traceback (most recent call last):
2025-12-04T11:11:26.4884982Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4885225Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4885754Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4885916Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4886483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4886684Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4886825Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4886860Z 
2025-12-04T11:11:26.4886960Z Expected 1 but got 0.
2025-12-04T11:11:26.4887063Z Absolute difference: 1
2025-12-04T11:11:26.4887186Z Relative difference: 1.0
2025-12-04T11:11:26.4887191Z 
2025-12-04T11:11:26.4887398Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4888299Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4888304Z 
2025-12-04T11:11:26.4888565Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4888779Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4888906Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4889579Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4889813Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4889908Z graph_break []
2025-12-04T11:11:26.4890025Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4890249Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4890966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4891062Z   warnings.warn(
2025-12-04T11:11:26.4891780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4891879Z   warnings.warn(
2025-12-04T11:11:26.4892102Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4892212Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4892436Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4893123Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4893217Z graph_break []
2025-12-04T11:11:26.4893335Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4893559Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4894272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4894384Z   warnings.warn(
2025-12-04T11:11:26.4895087Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4895183Z   warnings.warn(
2025-12-04T11:11:26.4895404Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4895515Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4895736Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4896431Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4896599Z graph_break []
2025-12-04T11:11:26.4896734Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4896944Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4897651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4897797Z   warnings.warn(
2025-12-04T11:11:26.4898500Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4898644Z   warnings.warn(
2025-12-04T11:11:26.4899468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml -
2025-12-04T11:11:26.4899638Z =========================== short test summary info ============================
2025-12-04T11:11:26.4900573Z FAILED [0.5146s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4900580Z 
2025-12-04T11:11:26.4900683Z Expected 1 but got 0.
2025-12-04T11:11:26.4900797Z Absolute difference: 1
2025-12-04T11:11:26.4901229Z Relative difference: 1.0
2025-12-04T11:11:26.4901235Z 
2025-12-04T11:11:26.4901447Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4902349Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4902354Z 
2025-12-04T11:11:26.4902616Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4902806Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4903000Z =================== 1 failed, 7 deselected, 2 rerun in 4.93s ===================
2025-12-04T11:11:26.4903102Z Got exit code 1
2025-12-04T11:11:26.4903221Z Retrying single test...
2025-12-04T11:11:26.4903655Z W1204 11:05:55.446000 91945 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4904304Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml
2025-12-04T11:11:26.4904482Z ============================= test session starts ==============================
2025-12-04T11:11:26.4904825Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4904948Z cachedir: .pytest_cache
2025-12-04T11:11:26.4905457Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4905583Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4905703Z configfile: pytest.ini
2025-12-04T11:11:26.4906232Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4906449Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4907424Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4907538Z Running 1 items in this shard
2025-12-04T11:11:26.4907543Z 
2025-12-04T11:11:26.4908803Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:00.441921241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4908951Z 
2025-12-04T11:11:26.4909465Z [W1204 11:06:16.056761902 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4909470Z 
2025-12-04T11:11:26.4910033Z [W1204 11:06:16.057022595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4910038Z 
2025-12-04T11:11:26.4910540Z [W1204 11:06:16.064989326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4910621Z 
2025-12-04T11:11:26.4911141Z [W1204 11:06:16.065843956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4911146Z 
2025-12-04T11:11:26.4911645Z [W1204 11:06:16.066030596 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4911649Z 
2025-12-04T11:11:26.4912153Z [W1204 11:06:16.073356303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4912173Z 
2025-12-04T11:11:26.4912672Z [W1204 11:06:16.074007942 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4912678Z 
2025-12-04T11:11:26.4913180Z [W1204 11:06:16.074201050 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4913187Z 
2025-12-04T11:11:26.4913657Z W1204 11:06:16.564000 91945 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4914159Z [W1204 11:06:16.260632875 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4914163Z 
2025-12-04T11:11:26.4914684Z [W1204 11:06:16.262356546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4914688Z 
2025-12-04T11:11:26.4915187Z [W1204 11:06:16.262560770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4915193Z 
2025-12-04T11:11:26.4915703Z [W1204 11:06:16.267025115 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4915707Z 
2025-12-04T11:11:26.4916208Z [W1204 11:06:16.267638604 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4916213Z 
2025-12-04T11:11:26.4916725Z [W1204 11:06:16.267828410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4916729Z 
2025-12-04T11:11:26.4917230Z [W1204 11:06:16.274289247 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4917235Z 
2025-12-04T11:11:26.4917736Z [W1204 11:06:16.274902933 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4917744Z 
2025-12-04T11:11:26.4918257Z [W1204 11:06:16.275089492 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4918263Z 
2025-12-04T11:11:26.4918395Z ('RERUN', {'yellow': True}) [19.5066s] [100%]
2025-12-04T11:11:26.4919660Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:17.715977763 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4919666Z 
2025-12-04T11:11:26.4920231Z [W1204 11:06:17.716671350 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4920237Z 
2025-12-04T11:11:26.4920752Z [W1204 11:06:17.716873571 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4920787Z 
2025-12-04T11:11:26.4921285Z [W1204 11:06:17.721635021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4921289Z 
2025-12-04T11:11:26.4921864Z [W1204 11:06:17.722242987 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4921907Z 
2025-12-04T11:11:26.4922406Z [W1204 11:06:17.722427648 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4922410Z 
2025-12-04T11:11:26.4922915Z [W1204 11:06:17.728965216 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4922937Z 
2025-12-04T11:11:26.4923434Z [W1204 11:06:17.729573685 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4923441Z 
2025-12-04T11:11:26.4923940Z [W1204 11:06:17.729757201 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4923945Z 
2025-12-04T11:11:26.4924456Z [W1204 11:06:17.832629080 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4924463Z 
2025-12-04T11:11:26.4924956Z [W1204 11:06:17.833434828 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4924961Z 
2025-12-04T11:11:26.4925471Z [W1204 11:06:17.833650672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4925480Z 
2025-12-04T11:11:26.4925979Z [W1204 11:06:17.838238860 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4925983Z 
2025-12-04T11:11:26.4926502Z [W1204 11:06:17.838921872 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4926506Z 
2025-12-04T11:11:26.4927007Z [W1204 11:06:17.839116695 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4927014Z 
2025-12-04T11:11:26.4927524Z [W1204 11:06:17.845755241 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4927529Z 
2025-12-04T11:11:26.4928026Z [W1204 11:06:17.846409992 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4928030Z 
2025-12-04T11:11:26.4928527Z [W1204 11:06:17.846600365 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4928546Z 
2025-12-04T11:11:26.4928674Z ('RERUN', {'yellow': True}) [0.5327s] [100%]
2025-12-04T11:11:26.4929925Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:17.224991281 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4929932Z 
2025-12-04T11:11:26.4930440Z [W1204 11:06:17.225740282 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4930445Z 
2025-12-04T11:11:26.4930941Z [W1204 11:06:17.225937479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4931014Z 
2025-12-04T11:11:26.4931523Z [W1204 11:06:17.230701523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4931527Z 
2025-12-04T11:11:26.4932054Z [W1204 11:06:17.231320251 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4932059Z 
2025-12-04T11:11:26.4932570Z [W1204 11:06:17.231507341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4932605Z 
2025-12-04T11:11:26.4933102Z [W1204 11:06:17.237938082 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4933106Z 
2025-12-04T11:11:26.4933603Z [W1204 11:06:17.238548021 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4933620Z 
2025-12-04T11:11:26.4934121Z [W1204 11:06:17.238731677 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4934126Z 
2025-12-04T11:11:26.4934623Z [W1204 11:06:17.340810390 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4934629Z 
2025-12-04T11:11:26.4935138Z [W1204 11:06:17.341543255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4935145Z 
2025-12-04T11:11:26.4935744Z [W1204 11:06:17.341744221 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4935749Z 
2025-12-04T11:11:26.4936260Z [W1204 11:06:17.346425265 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4936265Z 
2025-12-04T11:11:26.4936768Z [W1204 11:06:17.347075578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4936773Z 
2025-12-04T11:11:26.4937282Z [W1204 11:06:17.347270749 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4937289Z 
2025-12-04T11:11:26.4937785Z [W1204 11:06:17.355793590 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4937792Z 
2025-12-04T11:11:26.4938299Z [W1204 11:06:17.356944014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4938304Z 
2025-12-04T11:11:26.4938807Z [W1204 11:06:17.357151327 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4938811Z 
2025-12-04T11:11:26.4938911Z FAILED [0.5113s] [100%]
2025-12-04T11:11:26.4938915Z 
2025-12-04T11:11:26.4939070Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.4939568Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4939704Z Traceback (most recent call last):
2025-12-04T11:11:26.4940204Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4940431Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4940903Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4941065Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4941604Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4941873Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4942005Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4942010Z 
2025-12-04T11:11:26.4942124Z Expected 1 but got 0.
2025-12-04T11:11:26.4942227Z Absolute difference: 1
2025-12-04T11:11:26.4942367Z Relative difference: 1.0
2025-12-04T11:11:26.4942372Z 
2025-12-04T11:11:26.4942590Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4943474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4943510Z 
2025-12-04T11:11:26.4943785Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4944000Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4944113Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4944805Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4945028Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4945138Z graph_break []
2025-12-04T11:11:26.4945254Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4945465Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4946663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4946780Z   if out == self.unknown_value:
2025-12-04T11:11:26.4947510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4947613Z   warnings.warn(
2025-12-04T11:11:26.4948323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4948437Z   warnings.warn(
2025-12-04T11:11:26.4948935Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4949055Z Traceback (most recent call last):
2025-12-04T11:11:26.4949566Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4949795Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4950255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4950414Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4950941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4951155Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4951282Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4951289Z 
2025-12-04T11:11:26.4951404Z Expected 1 but got 0.
2025-12-04T11:11:26.4951507Z Absolute difference: 1
2025-12-04T11:11:26.4951613Z Relative difference: 1.0
2025-12-04T11:11:26.4951617Z 
2025-12-04T11:11:26.4951837Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4952720Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4952724Z 
2025-12-04T11:11:26.4952988Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4953277Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4953391Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4954085Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4954359Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4954454Z graph_break []
2025-12-04T11:11:26.4954590Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4954805Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4956030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4956144Z   if out == self.unknown_value:
2025-12-04T11:11:26.4956866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4956979Z   warnings.warn(
2025-12-04T11:11:26.4957689Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4957805Z   warnings.warn(
2025-12-04T11:11:26.4958022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4958135Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4958376Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4959081Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4959212Z graph_break []
2025-12-04T11:11:26.4959385Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4959703Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4960513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4960615Z   warnings.warn(
2025-12-04T11:11:26.4961407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4961586Z   warnings.warn(
2025-12-04T11:11:26.4961732Z =================================== FAILURES ===================================
2025-12-04T11:11:26.4962225Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.4962360Z Traceback (most recent call last):
2025-12-04T11:11:26.4962863Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.4963103Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.4963549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.4963711Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.4964247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.4964448Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.4964577Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4964595Z 
2025-12-04T11:11:26.4964699Z Expected 1 but got 0.
2025-12-04T11:11:26.4964804Z Absolute difference: 1
2025-12-04T11:11:26.4964926Z Relative difference: 1.0
2025-12-04T11:11:26.4964930Z 
2025-12-04T11:11:26.4965141Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4966213Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4966232Z 
2025-12-04T11:11:26.4966528Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4966742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4966871Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4967547Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4967804Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4967913Z graph_break []
2025-12-04T11:11:26.4968031Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4968244Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4969444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.4969559Z   if out == self.unknown_value:
2025-12-04T11:11:26.4970284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4970386Z   warnings.warn(
2025-12-04T11:11:26.4971097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4971215Z   warnings.warn(
2025-12-04T11:11:26.4971429Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4971555Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4971780Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4972461Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4972571Z graph_break []
2025-12-04T11:11:26.4972686Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4972896Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4973619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4973720Z   warnings.warn(
2025-12-04T11:11:26.4974437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4974534Z   warnings.warn(
2025-12-04T11:11:26.4974746Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.4974869Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.4975094Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.4975780Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.4975873Z graph_break []
2025-12-04T11:11:26.4975991Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.4976215Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.4976925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4977032Z   warnings.warn(
2025-12-04T11:11:26.4977816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.4977915Z   warnings.warn(
2025-12-04T11:11:26.4978750Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml -
2025-12-04T11:11:26.4978959Z =========================== short test summary info ============================
2025-12-04T11:11:26.4979882Z FAILED [0.5113s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.4979936Z 
2025-12-04T11:11:26.4980041Z Expected 1 but got 0.
2025-12-04T11:11:26.4980146Z Absolute difference: 1
2025-12-04T11:11:26.4980270Z Relative difference: 1.0
2025-12-04T11:11:26.4980275Z 
2025-12-04T11:11:26.4980489Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.4981375Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4981381Z 
2025-12-04T11:11:26.4981663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.4981843Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.4982052Z ================== 1 failed, 10 deselected, 2 rerun in 20.58s ==================
2025-12-04T11:11:26.4982153Z Got exit code 1
2025-12-04T11:11:26.4982259Z Retrying single test...
2025-12-04T11:11:26.4982715Z W1204 11:06:28.890000 92127 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4983413Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml
2025-12-04T11:11:26.4983598Z ============================= test session starts ==============================
2025-12-04T11:11:26.4983944Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.4984053Z cachedir: .pytest_cache
2025-12-04T11:11:26.4984578Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.4984700Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.4984807Z configfile: pytest.ini
2025-12-04T11:11:26.4985356Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.4985576Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.4986558Z stepcurrent: skipping 7 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.4986672Z Running 1 items in this shard
2025-12-04T11:11:26.4986677Z 
2025-12-04T11:11:26.4987942Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:34.961743955 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4987964Z 
2025-12-04T11:11:26.4988470Z [W1204 11:06:49.983476923 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4988478Z 
2025-12-04T11:11:26.4988980Z [W1204 11:06:49.983738171 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4988985Z 
2025-12-04T11:11:26.4989588Z [W1204 11:06:49.991654007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4989593Z 
2025-12-04T11:11:26.4990092Z [W1204 11:06:49.992527289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4990130Z 
2025-12-04T11:11:26.4990641Z [W1204 11:06:49.992715880 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4990646Z 
2025-12-04T11:11:26.4991145Z [W1204 11:06:49.000186837 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4991178Z 
2025-12-04T11:11:26.4991688Z [W1204 11:06:49.000920630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4991693Z 
2025-12-04T11:11:26.4992198Z [W1204 11:06:49.001107807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4992207Z 
2025-12-04T11:11:26.4992666Z W1204 11:06:49.492000 92127 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.4993169Z [W1204 11:06:49.190514124 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4993176Z 
2025-12-04T11:11:26.4993680Z [W1204 11:06:49.192252014 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4993687Z 
2025-12-04T11:11:26.4994202Z [W1204 11:06:49.192466941 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4994206Z 
2025-12-04T11:11:26.4994708Z [W1204 11:06:49.197113156 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4994713Z 
2025-12-04T11:11:26.4995228Z [W1204 11:06:49.197786576 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4995233Z 
2025-12-04T11:11:26.4995731Z [W1204 11:06:49.197980424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4995738Z 
2025-12-04T11:11:26.4996247Z [W1204 11:06:49.204602007 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4996253Z 
2025-12-04T11:11:26.4996749Z [W1204 11:06:49.205263362 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4996753Z 
2025-12-04T11:11:26.4997266Z [W1204 11:06:49.205453967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4997270Z 
2025-12-04T11:11:26.4997399Z ('RERUN', {'yellow': True}) [18.9565s] [100%]
2025-12-04T11:11:26.4998641Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:50.655051408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4998661Z 
2025-12-04T11:11:26.4999161Z [W1204 11:06:50.655794263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4999168Z 
2025-12-04T11:11:26.4999664Z [W1204 11:06:50.655991688 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.4999669Z 
2025-12-04T11:11:26.5000177Z [W1204 11:06:50.660801611 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5000182Z 
2025-12-04T11:11:26.5000783Z [W1204 11:06:50.661450424 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5000788Z 
2025-12-04T11:11:26.5001581Z [W1204 11:06:50.661640073 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5001675Z 
2025-12-04T11:11:26.5002177Z [W1204 11:06:50.668187189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5002181Z 
2025-12-04T11:11:26.5002736Z [W1204 11:06:50.668807784 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5002741Z 
2025-12-04T11:11:26.5003235Z [W1204 11:06:50.668994484 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5003239Z 
2025-12-04T11:11:26.5003758Z [W1204 11:06:50.774696537 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5003764Z 
2025-12-04T11:11:26.5004264Z [W1204 11:06:50.775467245 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5004271Z 
2025-12-04T11:11:26.5004767Z [W1204 11:06:50.775674185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5004790Z 
2025-12-04T11:11:26.5005286Z [W1204 11:06:50.780331145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5005292Z 
2025-12-04T11:11:26.5005787Z [W1204 11:06:50.781003320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5005791Z 
2025-12-04T11:11:26.5006306Z [W1204 11:06:50.781198546 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5006311Z 
2025-12-04T11:11:26.5006806Z [W1204 11:06:50.787816984 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5006813Z 
2025-12-04T11:11:26.5007322Z [W1204 11:06:50.788502090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5007327Z 
2025-12-04T11:11:26.5007824Z [W1204 11:06:50.788695536 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5007831Z 
2025-12-04T11:11:26.5007974Z ('RERUN', {'yellow': True}) [0.5439s] [100%]
2025-12-04T11:11:26.5009230Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 [W1204 11:06:50.174239687 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5009237Z 
2025-12-04T11:11:26.5009736Z [W1204 11:06:50.174994514 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5009762Z 
2025-12-04T11:11:26.5010263Z [W1204 11:06:50.175194290 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5010267Z 
2025-12-04T11:11:26.5010766Z [W1204 11:06:50.180017002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5010772Z 
2025-12-04T11:11:26.5011288Z [W1204 11:06:50.180660924 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5011293Z 
2025-12-04T11:11:26.5011793Z [W1204 11:06:50.180848909 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5011905Z 
2025-12-04T11:11:26.5012416Z [W1204 11:06:50.187406964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5012421Z 
2025-12-04T11:11:26.5012950Z [W1204 11:06:50.188036148 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5012955Z 
2025-12-04T11:11:26.5013469Z [W1204 11:06:50.188220274 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5013509Z 
2025-12-04T11:11:26.5014010Z [W1204 11:06:50.294881054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5014014Z 
2025-12-04T11:11:26.5014529Z [W1204 11:06:50.296109568 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5014533Z 
2025-12-04T11:11:26.5015032Z [W1204 11:06:50.296321565 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5015037Z 
2025-12-04T11:11:26.5015534Z [W1204 11:06:50.301708418 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5015555Z 
2025-12-04T11:11:26.5016054Z [W1204 11:06:50.302778422 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5016061Z 
2025-12-04T11:11:26.5016556Z [W1204 11:06:50.302978897 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5016561Z 
2025-12-04T11:11:26.5017071Z [W1204 11:06:50.310077266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5017075Z 
2025-12-04T11:11:26.5017575Z [W1204 11:06:50.310792521 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5017579Z 
2025-12-04T11:11:26.5018093Z [W1204 11:06:50.310985693 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5018099Z 
2025-12-04T11:11:26.5018198Z FAILED [0.5210s] [100%]
2025-12-04T11:11:26.5018203Z 
2025-12-04T11:11:26.5018356Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5018853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5018973Z Traceback (most recent call last):
2025-12-04T11:11:26.5019486Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5019714Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5020171Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5020346Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5020876Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5021090Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5021218Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5021226Z 
2025-12-04T11:11:26.5021329Z Expected 1 but got 0.
2025-12-04T11:11:26.5021447Z Absolute difference: 1
2025-12-04T11:11:26.5021553Z Relative difference: 1.0
2025-12-04T11:11:26.5021558Z 
2025-12-04T11:11:26.5021781Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5022754Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5022760Z 
2025-12-04T11:11:26.5023025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5023282Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5023395Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5024090Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5024341Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5024438Z graph_break []
2025-12-04T11:11:26.5024573Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5024785Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5025972Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5026101Z   if out == self.unknown_value:
2025-12-04T11:11:26.5026816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5026927Z   warnings.warn(
2025-12-04T11:11:26.5027632Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5027733Z   warnings.warn(
2025-12-04T11:11:26.5028240Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5028361Z Traceback (most recent call last):
2025-12-04T11:11:26.5028875Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5029100Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5029546Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5029722Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5030247Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5030449Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5030590Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5030595Z 
2025-12-04T11:11:26.5030697Z Expected 1 but got 0.
2025-12-04T11:11:26.5030814Z Absolute difference: 1
2025-12-04T11:11:26.5030919Z Relative difference: 1.0
2025-12-04T11:11:26.5030923Z 
2025-12-04T11:11:26.5031137Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5032048Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5032056Z 
2025-12-04T11:11:26.5032317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5032540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5032654Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5033337Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5033570Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5033665Z graph_break []
2025-12-04T11:11:26.5033786Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5034084Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5035265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5035419Z   if out == self.unknown_value:
2025-12-04T11:11:26.5036130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5036259Z   warnings.warn(
2025-12-04T11:11:26.5036975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5037071Z   warnings.warn(
2025-12-04T11:11:26.5037298Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5037410Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5037639Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5038330Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5038427Z graph_break []
2025-12-04T11:11:26.5038548Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5038771Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5039479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5039588Z   warnings.warn(
2025-12-04T11:11:26.5040292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5040387Z   warnings.warn(
2025-12-04T11:11:26.5040546Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5041037Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5041174Z Traceback (most recent call last):
2025-12-04T11:11:26.5041741Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5041973Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5042438Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5042598Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5043119Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5043340Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5043470Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5043475Z 
2025-12-04T11:11:26.5043592Z Expected 1 but got 0.
2025-12-04T11:11:26.5043695Z Absolute difference: 1
2025-12-04T11:11:26.5043804Z Relative difference: 1.0
2025-12-04T11:11:26.5043808Z 
2025-12-04T11:11:26.5044030Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5044911Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5044918Z 
2025-12-04T11:11:26.5045195Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5045408Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5045520Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5046303Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5046525Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5046652Z graph_break []
2025-12-04T11:11:26.5046780Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5046988Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5048182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5048328Z   if out == self.unknown_value:
2025-12-04T11:11:26.5049040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5049154Z   warnings.warn(
2025-12-04T11:11:26.5049864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5049976Z   warnings.warn(
2025-12-04T11:11:26.5050190Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5050301Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5050539Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5051216Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5051312Z graph_break []
2025-12-04T11:11:26.5051444Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5051656Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5052381Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5052478Z   warnings.warn(
2025-12-04T11:11:26.5053183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5053297Z   warnings.warn(
2025-12-04T11:11:26.5053508Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5053623Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5053860Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5054536Z inductor [('pattern_matcher_count', 6), ('pattern_matcher_nodes', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5054646Z graph_break []
2025-12-04T11:11:26.5054766Z aten_mm_info [('aten.mm_32_72_1024', 2)]
2025-12-04T11:11:26.5054981Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5055705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5055806Z   warnings.warn(
2025-12-04T11:11:26.5056522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5056620Z   warnings.warn(
2025-12-04T11:11:26.5057444Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml -
2025-12-04T11:11:26.5057628Z =========================== short test summary info ============================
2025-12-04T11:11:26.5058612Z FAILED [0.5210s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5058618Z 
2025-12-04T11:11:26.5058735Z Expected 1 but got 0.
2025-12-04T11:11:26.5058842Z Absolute difference: 1
2025-12-04T11:11:26.5058979Z Relative difference: 1.0
2025-12-04T11:11:26.5058984Z 
2025-12-04T11:11:26.5059209Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5060097Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5060132Z 
2025-12-04T11:11:26.5060407Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5060582Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5060777Z ================== 1 failed, 10 deselected, 2 rerun in 20.05s ==================
2025-12-04T11:11:26.5060895Z Got exit code 1
2025-12-04T11:11:26.5061696Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5062100Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.5062547Z W1204 11:07:01.958000 92309 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5063193Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml
2025-12-04T11:11:26.5063371Z ============================= test session starts ==============================
2025-12-04T11:11:26.5063711Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5063819Z cachedir: .pytest_cache
2025-12-04T11:11:26.5064391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5064514Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5064630Z configfile: pytest.ini
2025-12-04T11:11:26.5065158Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5065368Z collecting ... collected 58 items / 8 deselected / 50 selected
2025-12-04T11:11:26.5065519Z stepcurrent: skipping 8 already run items.
2025-12-04T11:11:26.5065628Z Running 3 items in this shard
2025-12-04T11:11:26.5065633Z 
2025-12-04T11:11:26.5066481Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.7630s] [ 33%]
2025-12-04T11:11:26.5067330Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4101s] [ 33%]
2025-12-04T11:11:26.5068090Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 FAILED [0.4080s] [ 33%]
2025-12-04T11:11:26.5068098Z 
2025-12-04T11:11:26.5068250Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5068749Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5068884Z Traceback (most recent call last):
2025-12-04T11:11:26.5069385Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5069611Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5070155Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5070321Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5070865Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5071101Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5071233Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5071238Z 
2025-12-04T11:11:26.5071355Z Expected 1 but got 2.
2025-12-04T11:11:26.5071489Z Absolute difference: 1
2025-12-04T11:11:26.5071603Z Relative difference: 1.0
2025-12-04T11:11:26.5071608Z 
2025-12-04T11:11:26.5071833Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5072714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5072719Z 
2025-12-04T11:11:26.5073002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5073220Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5073336Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5073868Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5074090Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5074202Z graph_break []
2025-12-04T11:11:26.5074415Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5075134Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5075247Z   warnings.warn(
2025-12-04T11:11:26.5075960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5076056Z   warnings.warn(
2025-12-04T11:11:26.5076561Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5076684Z Traceback (most recent call last):
2025-12-04T11:11:26.5077195Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5077421Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5077873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5078048Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5078573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5078790Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5078918Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5078923Z 
2025-12-04T11:11:26.5079024Z Expected 1 but got 2.
2025-12-04T11:11:26.5079141Z Absolute difference: 1
2025-12-04T11:11:26.5079246Z Relative difference: 1.0
2025-12-04T11:11:26.5079251Z 
2025-12-04T11:11:26.5079459Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5080357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5080364Z 
2025-12-04T11:11:26.5080628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5080856Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5080970Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5081614Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5081857Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5081985Z graph_break []
2025-12-04T11:11:26.5082213Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5082936Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5083086Z   warnings.warn(
2025-12-04T11:11:26.5083810Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5083909Z   warnings.warn(
2025-12-04T11:11:26.5084121Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5084249Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5084472Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5084999Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5085095Z graph_break []
2025-12-04T11:11:26.5085305Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5086029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5086128Z   warnings.warn(
2025-12-04T11:11:26.5086847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5086945Z   warnings.warn(
2025-12-04T11:11:26.5087090Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5087594Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5087713Z Traceback (most recent call last):
2025-12-04T11:11:26.5088212Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5088451Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5088899Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5089074Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5089596Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5089796Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5089942Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5089947Z 
2025-12-04T11:11:26.5090050Z Expected 1 but got 2.
2025-12-04T11:11:26.5090153Z Absolute difference: 1
2025-12-04T11:11:26.5090272Z Relative difference: 1.0
2025-12-04T11:11:26.5090279Z 
2025-12-04T11:11:26.5090487Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5091384Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5091391Z 
2025-12-04T11:11:26.5091653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5091864Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5091989Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5092566Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5092802Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5092896Z graph_break []
2025-12-04T11:11:26.5093110Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5093868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5093996Z   warnings.warn(
2025-12-04T11:11:26.5094712Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5094807Z   warnings.warn(
2025-12-04T11:11:26.5095016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5095143Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5095368Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5095880Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5095989Z graph_break []
2025-12-04T11:11:26.5096197Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5096919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5097017Z   warnings.warn(
2025-12-04T11:11:26.5097716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5097823Z   warnings.warn(
2025-12-04T11:11:26.5098033Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5098141Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5098380Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5098891Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5098999Z graph_break []
2025-12-04T11:11:26.5099210Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5099916Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5100028Z   warnings.warn(
2025-12-04T11:11:26.5100731Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5101031Z   warnings.warn(
2025-12-04T11:11:26.5101861Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml -
2025-12-04T11:11:26.5102030Z =========================== short test summary info ============================
2025-12-04T11:11:26.5102961Z FAILED [0.4080s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5102967Z 
2025-12-04T11:11:26.5103069Z Expected 1 but got 2.
2025-12-04T11:11:26.5103189Z Absolute difference: 1
2025-12-04T11:11:26.5103294Z Relative difference: 1.0
2025-12-04T11:11:26.5103299Z 
2025-12-04T11:11:26.5103510Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5104407Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5104538Z 
2025-12-04T11:11:26.5104805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5105003Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5105240Z =================== 1 failed, 8 deselected, 2 rerun in 4.61s ===================
2025-12-04T11:11:26.5105337Z Got exit code 1
2025-12-04T11:11:26.5105458Z Retrying single test...
2025-12-04T11:11:26.5105893Z W1204 11:07:21.929000 92478 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5106583Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml
2025-12-04T11:11:26.5106756Z ============================= test session starts ==============================
2025-12-04T11:11:26.5107096Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5107218Z cachedir: .pytest_cache
2025-12-04T11:11:26.5107724Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5107846Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5107966Z configfile: pytest.ini
2025-12-04T11:11:26.5108493Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5108707Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5109676Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5109801Z Running 1 items in this shard
2025-12-04T11:11:26.5109806Z 
2025-12-04T11:11:26.5111071Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:25.096109213 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5111079Z 
2025-12-04T11:11:26.5111589Z [W1204 11:07:41.895403086 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5111594Z 
2025-12-04T11:11:26.5112109Z [W1204 11:07:41.895684721 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5112117Z 
2025-12-04T11:11:26.5112614Z [W1204 11:07:41.903302263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5112620Z 
2025-12-04T11:11:26.5113133Z [W1204 11:07:41.904103517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5113141Z 
2025-12-04T11:11:26.5113644Z [W1204 11:07:41.904299140 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5113649Z 
2025-12-04T11:11:26.5114150Z [W1204 11:07:41.911304172 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5114169Z 
2025-12-04T11:11:26.5114666Z [W1204 11:07:41.912046871 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5114673Z 
2025-12-04T11:11:26.5115172Z [W1204 11:07:41.912236410 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5115177Z 
2025-12-04T11:11:26.5115690Z [W1204 11:07:43.860969451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5115694Z 
2025-12-04T11:11:26.5116256Z [W1204 11:07:43.862717857 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5116261Z 
2025-12-04T11:11:26.5116770Z [W1204 11:07:43.862921406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5116803Z 
2025-12-04T11:11:26.5117304Z [W1204 11:07:43.866832964 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5117337Z 
2025-12-04T11:11:26.5117849Z [W1204 11:07:43.867494541 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5117854Z 
2025-12-04T11:11:26.5118351Z [W1204 11:07:43.867687451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5118356Z 
2025-12-04T11:11:26.5118873Z [W1204 11:07:43.873683041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5118878Z 
2025-12-04T11:11:26.5119378Z [W1204 11:07:43.874374385 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5119385Z 
2025-12-04T11:11:26.5119882Z [W1204 11:07:43.874564874 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5119907Z 
2025-12-04T11:11:26.5120039Z ('RERUN', {'yellow': True}) [19.5915s] [100%]
2025-12-04T11:11:26.5121287Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:43.236709142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5121293Z 
2025-12-04T11:11:26.5121871Z [W1204 11:07:43.237511092 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5121877Z 
2025-12-04T11:11:26.5122377Z [W1204 11:07:43.237718651 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5122384Z 
2025-12-04T11:11:26.5122898Z [W1204 11:07:43.241690374 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5122905Z 
2025-12-04T11:11:26.5123404Z [W1204 11:07:43.242532181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5123408Z 
2025-12-04T11:11:26.5123921Z [W1204 11:07:43.242722598 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5123926Z 
2025-12-04T11:11:26.5124426Z [W1204 11:07:43.248589134 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5124431Z 
2025-12-04T11:11:26.5124928Z [W1204 11:07:43.249220746 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5124949Z 
2025-12-04T11:11:26.5125450Z [W1204 11:07:43.249405647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5125457Z 
2025-12-04T11:11:26.5125955Z [W1204 11:07:43.336986710 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5125960Z 
2025-12-04T11:11:26.5126476Z [W1204 11:07:43.337784091 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5126481Z 
2025-12-04T11:11:26.5127040Z [W1204 11:07:43.337991768 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5127045Z 
2025-12-04T11:11:26.5127562Z [W1204 11:07:43.341962348 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5127610Z 
2025-12-04T11:11:26.5128107Z [W1204 11:07:43.342669075 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5128111Z 
2025-12-04T11:11:26.5128676Z [W1204 11:07:43.342863476 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5128680Z 
2025-12-04T11:11:26.5129176Z [W1204 11:07:43.348870113 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5129180Z 
2025-12-04T11:11:26.5129694Z [W1204 11:07:43.349755938 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5129698Z 
2025-12-04T11:11:26.5130197Z [W1204 11:07:43.349949727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5130204Z 
2025-12-04T11:11:26.5130331Z ('RERUN', {'yellow': True}) [0.4368s] [100%]
2025-12-04T11:11:26.5131584Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:44.648017488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5131592Z 
2025-12-04T11:11:26.5132093Z [W1204 11:07:44.648801020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5132097Z 
2025-12-04T11:11:26.5132619Z [W1204 11:07:44.649002947 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5132624Z 
2025-12-04T11:11:26.5133124Z [W1204 11:07:44.652929666 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5133132Z 
2025-12-04T11:11:26.5133645Z [W1204 11:07:44.653767752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5133650Z 
2025-12-04T11:11:26.5134143Z [W1204 11:07:44.653959206 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5134150Z 
2025-12-04T11:11:26.5134660Z [W1204 11:07:44.659862182 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5134665Z 
2025-12-04T11:11:26.5135163Z [W1204 11:07:44.660531722 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5135168Z 
2025-12-04T11:11:26.5135674Z [W1204 11:07:44.660724177 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5135681Z 
2025-12-04T11:11:26.5136174Z [W1204 11:07:44.747307708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5136178Z 
2025-12-04T11:11:26.5136675Z [W1204 11:07:44.748074996 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5136681Z 
2025-12-04T11:11:26.5137192Z [W1204 11:07:44.748278663 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5137197Z 
2025-12-04T11:11:26.5137694Z [W1204 11:07:44.752134291 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5137760Z 
2025-12-04T11:11:26.5138273Z [W1204 11:07:44.752773003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5138277Z 
2025-12-04T11:11:26.5138802Z [W1204 11:07:44.752965699 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5138806Z 
2025-12-04T11:11:26.5139311Z [W1204 11:07:44.758801988 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5139346Z 
2025-12-04T11:11:26.5139841Z [W1204 11:07:44.759589392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5139846Z 
2025-12-04T11:11:26.5140350Z [W1204 11:07:44.759778107 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5140355Z 
2025-12-04T11:11:26.5140459Z FAILED [0.4072s] [100%]
2025-12-04T11:11:26.5140464Z 
2025-12-04T11:11:26.5140604Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5141109Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5141230Z Traceback (most recent call last):
2025-12-04T11:11:26.5141738Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5141967Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5142419Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5142592Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5143117Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5143319Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5143462Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5143467Z 
2025-12-04T11:11:26.5143571Z Expected 1 but got 2.
2025-12-04T11:11:26.5143692Z Absolute difference: 1
2025-12-04T11:11:26.5143797Z Relative difference: 1.0
2025-12-04T11:11:26.5143801Z 
2025-12-04T11:11:26.5144009Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5144904Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5144911Z 
2025-12-04T11:11:26.5145172Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5145400Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5145512Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5146073Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5146305Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5146402Z graph_break []
2025-12-04T11:11:26.5146611Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5147799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5147914Z   if out == self.unknown_value:
2025-12-04T11:11:26.5148636Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5148735Z   warnings.warn(
2025-12-04T11:11:26.5149591Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5149704Z   warnings.warn(
2025-12-04T11:11:26.5150230Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5150365Z Traceback (most recent call last):
2025-12-04T11:11:26.5151247Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5151504Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5151966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5152125Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5152665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5152866Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5152994Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5153001Z 
2025-12-04T11:11:26.5153116Z Expected 1 but got 2.
2025-12-04T11:11:26.5153220Z Absolute difference: 1
2025-12-04T11:11:26.5153326Z Relative difference: 1.0
2025-12-04T11:11:26.5153344Z 
2025-12-04T11:11:26.5153553Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5154432Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5154437Z 
2025-12-04T11:11:26.5154712Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5154925Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5155041Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5155570Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5155792Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5155904Z graph_break []
2025-12-04T11:11:26.5156118Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5157290Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5157417Z   if out == self.unknown_value:
2025-12-04T11:11:26.5158131Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5158240Z   warnings.warn(
2025-12-04T11:11:26.5158946Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5159045Z   warnings.warn(
2025-12-04T11:11:26.5159271Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5159384Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5159609Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5160137Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5160231Z graph_break []
2025-12-04T11:11:26.5160454Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5161219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5161317Z   warnings.warn(
2025-12-04T11:11:26.5162105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5162246Z   warnings.warn(
2025-12-04T11:11:26.5162404Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5162898Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5163047Z Traceback (most recent call last):
2025-12-04T11:11:26.5163556Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5163782Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5164232Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5164411Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5164936Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5165151Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5165280Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5165285Z 
2025-12-04T11:11:26.5165387Z Expected 1 but got 2.
2025-12-04T11:11:26.5165504Z Absolute difference: 1
2025-12-04T11:11:26.5165610Z Relative difference: 1.0
2025-12-04T11:11:26.5165615Z 
2025-12-04T11:11:26.5165822Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5166712Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5166721Z 
2025-12-04T11:11:26.5166982Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5167207Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5167322Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5167833Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5168066Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5168163Z graph_break []
2025-12-04T11:11:26.5168385Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5169561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5169677Z   if out == self.unknown_value:
2025-12-04T11:11:26.5170398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5170498Z   warnings.warn(
2025-12-04T11:11:26.5171215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5171311Z   warnings.warn(
2025-12-04T11:11:26.5171524Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5171651Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5171875Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5172409Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5172509Z graph_break []
2025-12-04T11:11:26.5172802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5173538Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5173677Z   warnings.warn(
2025-12-04T11:11:26.5174388Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5174526Z   warnings.warn(
2025-12-04T11:11:26.5174741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5174868Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5175093Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5175618Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5175725Z graph_break []
2025-12-04T11:11:26.5175937Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5176645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5176757Z   warnings.warn(
2025-12-04T11:11:26.5177464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5177681Z   warnings.warn(
2025-12-04T11:11:26.5178544Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml -
2025-12-04T11:11:26.5178756Z =========================== short test summary info ============================
2025-12-04T11:11:26.5179949Z FAILED [0.4072s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5179955Z 
2025-12-04T11:11:26.5180186Z Expected 1 but got 2.
2025-12-04T11:11:26.5188109Z Absolute difference: 1
2025-12-04T11:11:26.5188284Z Relative difference: 1.0
2025-12-04T11:11:26.5188291Z 
2025-12-04T11:11:26.5188541Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5189456Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5189462Z 
2025-12-04T11:11:26.5189731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5189926Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5190130Z ================== 1 failed, 10 deselected, 2 rerun in 20.47s ==================
2025-12-04T11:11:26.5190243Z Got exit code 1
2025-12-04T11:11:26.5190348Z Retrying single test...
2025-12-04T11:11:26.5190790Z W1204 11:07:55.556000 92652 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5191454Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml
2025-12-04T11:11:26.5191619Z ============================= test session starts ==============================
2025-12-04T11:11:26.5191980Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5192088Z cachedir: .pytest_cache
2025-12-04T11:11:26.5192599Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5192734Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5192991Z configfile: pytest.ini
2025-12-04T11:11:26.5193525Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5193768Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5194749Z stepcurrent: skipping 8 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5194901Z Running 1 items in this shard
2025-12-04T11:11:26.5194906Z 
2025-12-04T11:11:26.5196156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:07:59.692614152 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5196162Z 
2025-12-04T11:11:26.5196674Z [W1204 11:08:14.944741026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5196679Z 
2025-12-04T11:11:26.5197193Z [W1204 11:08:14.944997489 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5197201Z 
2025-12-04T11:11:26.5197700Z [W1204 11:08:14.952347341 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5197707Z 
2025-12-04T11:11:26.5198222Z [W1204 11:08:14.953092592 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5198226Z 
2025-12-04T11:11:26.5198726Z [W1204 11:08:14.953283246 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5198731Z 
2025-12-04T11:11:26.5199235Z [W1204 11:08:14.960194788 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5199255Z 
2025-12-04T11:11:26.5199756Z [W1204 11:08:14.960887744 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5199763Z 
2025-12-04T11:11:26.5200258Z [W1204 11:08:14.961078458 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5200265Z 
2025-12-04T11:11:26.5200778Z [W1204 11:08:16.906920855 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5200782Z 
2025-12-04T11:11:26.5201549Z [W1204 11:08:16.908640077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5201555Z 
2025-12-04T11:11:26.5202077Z [W1204 11:08:16.908843055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5202081Z 
2025-12-04T11:11:26.5202578Z [W1204 11:08:16.912724829 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5202586Z 
2025-12-04T11:11:26.5203097Z [W1204 11:08:16.913376527 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5203102Z 
2025-12-04T11:11:26.5203598Z [W1204 11:08:16.913568523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5203603Z 
2025-12-04T11:11:26.5204113Z [W1204 11:08:16.919508045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5204118Z 
2025-12-04T11:11:26.5204745Z [W1204 11:08:16.920153636 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5204751Z 
2025-12-04T11:11:26.5205253Z [W1204 11:08:16.920348056 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5205301Z 
2025-12-04T11:11:26.5205443Z ('RERUN', {'yellow': True}) [19.0097s] [100%]
2025-12-04T11:11:26.5206688Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:08:16.281777427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5206736Z 
2025-12-04T11:11:26.5207250Z [W1204 11:08:16.282559976 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5207254Z 
2025-12-04T11:11:26.5207755Z [W1204 11:08:16.282756834 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5207760Z 
2025-12-04T11:11:26.5208271Z [W1204 11:08:16.286660617 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5208278Z 
2025-12-04T11:11:26.5208779Z [W1204 11:08:16.287461036 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5208783Z 
2025-12-04T11:11:26.5209297Z [W1204 11:08:16.287650728 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5209302Z 
2025-12-04T11:11:26.5209798Z [W1204 11:08:16.293628739 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5209803Z 
2025-12-04T11:11:26.5210304Z [W1204 11:08:16.294267684 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5210322Z 
2025-12-04T11:11:26.5210822Z [W1204 11:08:16.294451033 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5210829Z 
2025-12-04T11:11:26.5211322Z [W1204 11:08:16.381661027 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5211327Z 
2025-12-04T11:11:26.5211839Z [W1204 11:08:16.382436194 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5211847Z 
2025-12-04T11:11:26.5212344Z [W1204 11:08:16.382632566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5212349Z 
2025-12-04T11:11:26.5212861Z [W1204 11:08:16.386514550 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5212866Z 
2025-12-04T11:11:26.5213365Z [W1204 11:08:16.387176268 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5213372Z 
2025-12-04T11:11:26.5213886Z [W1204 11:08:16.387367770 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5213891Z 
2025-12-04T11:11:26.5214388Z [W1204 11:08:16.393429266 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5214394Z 
2025-12-04T11:11:26.5214901Z [W1204 11:08:16.394299259 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5214905Z 
2025-12-04T11:11:26.5215400Z [W1204 11:08:16.394491325 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5215463Z 
2025-12-04T11:11:26.5215592Z ('RERUN', {'yellow': True}) [0.4359s] [100%]
2025-12-04T11:11:26.5216840Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 [W1204 11:08:17.694085868 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5216875Z 
2025-12-04T11:11:26.5217376Z [W1204 11:08:17.694868045 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5217409Z 
2025-12-04T11:11:26.5217919Z [W1204 11:08:17.695063346 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5217924Z 
2025-12-04T11:11:26.5218422Z [W1204 11:08:17.698966477 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5218431Z 
2025-12-04T11:11:26.5218939Z [W1204 11:08:17.699767145 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5218943Z 
2025-12-04T11:11:26.5219441Z [W1204 11:08:17.699952906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5219445Z 
2025-12-04T11:11:26.5219955Z [W1204 11:08:17.705935317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5219962Z 
2025-12-04T11:11:26.5220457Z [W1204 11:08:17.706605906 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5220462Z 
2025-12-04T11:11:26.5220959Z [W1204 11:08:17.706792643 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5220978Z 
2025-12-04T11:11:26.5221481Z [W1204 11:08:17.795012745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5221486Z 
2025-12-04T11:11:26.5221986Z [W1204 11:08:17.795805866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5221992Z 
2025-12-04T11:11:26.5222503Z [W1204 11:08:17.796004798 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5222510Z 
2025-12-04T11:11:26.5223006Z [W1204 11:08:17.799969690 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5223010Z 
2025-12-04T11:11:26.5223517Z [W1204 11:08:17.800639396 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5223522Z 
2025-12-04T11:11:26.5224022Z [W1204 11:08:17.800840771 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5224026Z 
2025-12-04T11:11:26.5224536Z [W1204 11:08:17.806800310 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5224542Z 
2025-12-04T11:11:26.5225040Z [W1204 11:08:17.807635215 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5225047Z 
2025-12-04T11:11:26.5225561Z [W1204 11:08:17.807876956 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5225566Z 
2025-12-04T11:11:26.5225663Z FAILED [0.4121s] [100%]
2025-12-04T11:11:26.5225668Z 
2025-12-04T11:11:26.5225811Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5226396Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5226517Z Traceback (most recent call last):
2025-12-04T11:11:26.5227029Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5227290Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5227743Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5227949Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5228477Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5228680Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5228824Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5228829Z 
2025-12-04T11:11:26.5228932Z Expected 1 but got 2.
2025-12-04T11:11:26.5229054Z Absolute difference: 1
2025-12-04T11:11:26.5229161Z Relative difference: 1.0
2025-12-04T11:11:26.5229165Z 
2025-12-04T11:11:26.5229375Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5230272Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5230278Z 
2025-12-04T11:11:26.5230541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5230772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5230885Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5231406Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5231642Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5231738Z graph_break []
2025-12-04T11:11:26.5231953Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5233142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5233259Z   if out == self.unknown_value:
2025-12-04T11:11:26.5233984Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5234083Z   warnings.warn(
2025-12-04T11:11:26.5234782Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5234892Z   warnings.warn(
2025-12-04T11:11:26.5235391Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5235525Z Traceback (most recent call last):
2025-12-04T11:11:26.5236019Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5236248Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5236710Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5236875Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5237411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5237617Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5237750Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5237756Z 
2025-12-04T11:11:26.5237932Z Expected 1 but got 2.
2025-12-04T11:11:26.5238038Z Absolute difference: 1
2025-12-04T11:11:26.5238143Z Relative difference: 1.0
2025-12-04T11:11:26.5238148Z 
2025-12-04T11:11:26.5238372Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5239280Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5239315Z 
2025-12-04T11:11:26.5239592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5239805Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5239918Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5240448Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5240676Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5240788Z graph_break []
2025-12-04T11:11:26.5240998Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5242241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5242373Z   if out == self.unknown_value:
2025-12-04T11:11:26.5243079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5243192Z   warnings.warn(
2025-12-04T11:11:26.5243897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5243997Z   warnings.warn(
2025-12-04T11:11:26.5244224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5244336Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5244558Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5245091Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5245188Z graph_break []
2025-12-04T11:11:26.5245415Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5246126Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5246223Z   warnings.warn(
2025-12-04T11:11:26.5246940Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5247038Z   warnings.warn(
2025-12-04T11:11:26.5247179Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5247684Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 _
2025-12-04T11:11:26.5247803Z Traceback (most recent call last):
2025-12-04T11:11:26.5248315Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5248543Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5248990Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5249163Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5249892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5250115Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5250245Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5250252Z 
2025-12-04T11:11:26.5250385Z Expected 1 but got 2.
2025-12-04T11:11:26.5250552Z Absolute difference: 1
2025-12-04T11:11:26.5250689Z Relative difference: 1.0
2025-12-04T11:11:26.5250693Z 
2025-12-04T11:11:26.5250908Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5251931Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5251937Z 
2025-12-04T11:11:26.5252202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5252431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5252555Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5253075Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5253313Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5253412Z graph_break []
2025-12-04T11:11:26.5253636Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5254813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5254929Z   if out == self.unknown_value:
2025-12-04T11:11:26.5255657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5255761Z   warnings.warn(
2025-12-04T11:11:26.5256481Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5256580Z   warnings.warn(
2025-12-04T11:11:26.5256790Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5256920Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5257144Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5257663Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5257775Z graph_break []
2025-12-04T11:11:26.5257987Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5258711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5258807Z   warnings.warn(
2025-12-04T11:11:26.5259511Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5259625Z   warnings.warn(
2025-12-04T11:11:26.5259832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5259947Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5260185Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5260701Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5260809Z graph_break []
2025-12-04T11:11:26.5261019Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5261823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5261934Z   warnings.warn(
2025-12-04T11:11:26.5262635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5262786Z   warnings.warn(
2025-12-04T11:11:26.5263608Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml -
2025-12-04T11:11:26.5263829Z =========================== short test summary info ============================
2025-12-04T11:11:26.5264769Z FAILED [0.4121s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5264776Z 
2025-12-04T11:11:26.5264882Z Expected 1 but got 2.
2025-12-04T11:11:26.5264999Z Absolute difference: 1
2025-12-04T11:11:26.5265104Z Relative difference: 1.0
2025-12-04T11:11:26.5265109Z 
2025-12-04T11:11:26.5265321Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5266220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5266227Z 
2025-12-04T11:11:26.5266491Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5266684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5266877Z ================== 1 failed, 10 deselected, 2 rerun in 19.89s ==================
2025-12-04T11:11:26.5266973Z Got exit code 1
2025-12-04T11:11:26.5267788Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16
2025-12-04T11:11:26.5268190Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.5268629Z W1204 11:08:28.626000 92826 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5269294Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml
2025-12-04T11:11:26.5269459Z ============================= test session starts ==============================
2025-12-04T11:11:26.5269813Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5269922Z cachedir: .pytest_cache
2025-12-04T11:11:26.5270435Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5270575Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5270682Z configfile: pytest.ini
2025-12-04T11:11:26.5271227Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5271443Z collecting ... collected 58 items / 9 deselected / 49 selected
2025-12-04T11:11:26.5271582Z stepcurrent: skipping 9 already run items.
2025-12-04T11:11:26.5271709Z Running 2 items in this shard
2025-12-04T11:11:26.5271716Z 
2025-12-04T11:11:26.5272568Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [3.8097s] [ 50%]
2025-12-04T11:11:26.5273490Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4874s] [ 50%]
2025-12-04T11:11:26.5274355Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 FAILED [0.4819s] [ 50%]
2025-12-04T11:11:26.5274361Z 
2025-12-04T11:11:26.5274501Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5275043Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5275164Z Traceback (most recent call last):
2025-12-04T11:11:26.5275720Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5275952Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5276411Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5276585Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5277118Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5277339Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5277470Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5277475Z 
2025-12-04T11:11:26.5277578Z Expected 1 but got 2.
2025-12-04T11:11:26.5277697Z Absolute difference: 1
2025-12-04T11:11:26.5277806Z Relative difference: 1.0
2025-12-04T11:11:26.5277811Z 
2025-12-04T11:11:26.5278022Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5278930Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5278936Z 
2025-12-04T11:11:26.5279199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5279435Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5279548Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5280066Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5280304Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5280401Z graph_break []
2025-12-04T11:11:26.5280628Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5281347Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5281446Z   warnings.warn(
2025-12-04T11:11:26.5282242Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5282347Z   warnings.warn(
2025-12-04T11:11:26.5282852Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5282989Z Traceback (most recent call last):
2025-12-04T11:11:26.5283492Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5283734Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5284182Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5284349Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5284892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5285095Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5285243Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5285343Z 
2025-12-04T11:11:26.5285446Z Expected 1 but got 2.
2025-12-04T11:11:26.5285551Z Absolute difference: 1
2025-12-04T11:11:26.5285669Z Relative difference: 1.0
2025-12-04T11:11:26.5285674Z 
2025-12-04T11:11:26.5285916Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5286806Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5286855Z 
2025-12-04T11:11:26.5287116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5287330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5287458Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5287979Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5288204Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5288311Z graph_break []
2025-12-04T11:11:26.5288521Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5289253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5289350Z   warnings.warn(
2025-12-04T11:11:26.5290061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5290170Z   warnings.warn(
2025-12-04T11:11:26.5290387Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5290497Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5290735Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5291250Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5291358Z graph_break []
2025-12-04T11:11:26.5291570Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5292286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5292393Z   warnings.warn(
2025-12-04T11:11:26.5293099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5293207Z   warnings.warn(
2025-12-04T11:11:26.5293350Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5293853Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5293984Z Traceback (most recent call last):
2025-12-04T11:11:26.5294489Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5294718Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5295181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5295343Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5295882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5296083Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5296212Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5296217Z 
2025-12-04T11:11:26.5296332Z Expected 1 but got 2.
2025-12-04T11:11:26.5296587Z Absolute difference: 1
2025-12-04T11:11:26.5296697Z Relative difference: 1.0
2025-12-04T11:11:26.5296702Z 
2025-12-04T11:11:26.5296928Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5297822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5298409Z 
2025-12-04T11:11:26.5298691Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5298943Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5299057Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5299592Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5299815Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5299928Z graph_break []
2025-12-04T11:11:26.5300140Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5301042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5301160Z   warnings.warn(
2025-12-04T11:11:26.5301870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5301984Z   warnings.warn(
2025-12-04T11:11:26.5302194Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5302304Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5302538Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5303059Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5303154Z graph_break []
2025-12-04T11:11:26.5303379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5304085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5304197Z   warnings.warn(
2025-12-04T11:11:26.5304897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5304994Z   warnings.warn(
2025-12-04T11:11:26.5305217Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5305327Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5305547Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5306081Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5306175Z graph_break []
2025-12-04T11:11:26.5306400Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5307110Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5307206Z   warnings.warn(
2025-12-04T11:11:26.5307922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5308020Z   warnings.warn(
2025-12-04T11:11:26.5308853Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml -
2025-12-04T11:11:26.5309168Z =========================== short test summary info ============================
2025-12-04T11:11:26.5310095Z FAILED [0.4819s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5310166Z 
2025-12-04T11:11:26.5310288Z Expected 1 but got 2.
2025-12-04T11:11:26.5310394Z Absolute difference: 1
2025-12-04T11:11:26.5310503Z Relative difference: 1.0
2025-12-04T11:11:26.5310523Z 
2025-12-04T11:11:26.5310782Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5311672Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5311677Z 
2025-12-04T11:11:26.5311952Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5312137Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5312345Z =================== 1 failed, 9 deselected, 2 rerun in 4.81s ===================
2025-12-04T11:11:26.5312440Z Got exit code 1
2025-12-04T11:11:26.5312546Z Retrying single test...
2025-12-04T11:11:26.5313002Z W1204 11:08:48.570000 93002 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5313650Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml
2025-12-04T11:11:26.5313812Z ============================= test session starts ==============================
2025-12-04T11:11:26.5314171Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5314279Z cachedir: .pytest_cache
2025-12-04T11:11:26.5314806Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5314927Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5315030Z configfile: pytest.ini
2025-12-04T11:11:26.5315571Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5315788Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5316750Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5316876Z Running 1 items in this shard
2025-12-04T11:11:26.5316881Z 
2025-12-04T11:11:26.5318133Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:08:52.791913890 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5318139Z 
2025-12-04T11:11:26.5318659Z [W1204 11:09:07.065180470 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5318667Z 
2025-12-04T11:11:26.5319166Z [W1204 11:09:07.065436755 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5319197Z 
2025-12-04T11:11:26.5319698Z [W1204 11:09:07.072816011 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5319703Z 
2025-12-04T11:11:26.5320200Z [W1204 11:09:07.073546411 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5320205Z 
2025-12-04T11:11:26.5320787Z [W1204 11:09:07.073731562 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5320792Z 
2025-12-04T11:11:26.5321290Z [W1204 11:09:07.080557560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5321324Z 
2025-12-04T11:11:26.5321928Z [W1204 11:09:07.081221060 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5321935Z 
2025-12-04T11:11:26.5322437Z [W1204 11:09:07.081402320 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5322487Z 
2025-12-04T11:11:26.5322999Z [W1204 11:09:09.032476697 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5323004Z 
2025-12-04T11:11:26.5323507Z [W1204 11:09:09.034189229 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5323511Z 
2025-12-04T11:11:26.5324026Z [W1204 11:09:09.034391449 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5324032Z 
2025-12-04T11:11:26.5324528Z [W1204 11:09:09.038208488 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5324533Z 
2025-12-04T11:11:26.5325028Z [W1204 11:09:09.038842622 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5325049Z 
2025-12-04T11:11:26.5325549Z [W1204 11:09:09.039031517 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5325553Z 
2025-12-04T11:11:26.5326052Z [W1204 11:09:09.044913055 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5326061Z 
2025-12-04T11:11:26.5326574Z [W1204 11:09:09.045537624 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5326579Z 
2025-12-04T11:11:26.5327077Z [W1204 11:09:09.045725149 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5327081Z 
2025-12-04T11:11:26.5327228Z ('RERUN', {'yellow': True}) [19.1233s] [100%]
2025-12-04T11:11:26.5328479Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:09.482210351 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5328487Z 
2025-12-04T11:11:26.5329001Z [W1204 11:09:09.482985376 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5329009Z 
2025-12-04T11:11:26.5329512Z [W1204 11:09:09.483181610 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5329517Z 
2025-12-04T11:11:26.5330031Z [W1204 11:09:09.487043780 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5330036Z 
2025-12-04T11:11:26.5330531Z [W1204 11:09:09.487866062 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5330538Z 
2025-12-04T11:11:26.5331034Z [W1204 11:09:09.488052338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5331039Z 
2025-12-04T11:11:26.5331552Z [W1204 11:09:09.494020881 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5331556Z 
2025-12-04T11:11:26.5332126Z [W1204 11:09:09.494706595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5332132Z 
2025-12-04T11:11:26.5332646Z [W1204 11:09:09.494893338 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5332688Z 
2025-12-04T11:11:26.5333187Z [W1204 11:09:09.583668041 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5333222Z 
2025-12-04T11:11:26.5333734Z [W1204 11:09:09.584457827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5333739Z 
2025-12-04T11:11:26.5334237Z [W1204 11:09:09.584660440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5334242Z 
2025-12-04T11:11:26.5334756Z [W1204 11:09:09.588498667 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5334761Z 
2025-12-04T11:11:26.5335261Z [W1204 11:09:09.589130188 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5335267Z 
2025-12-04T11:11:26.5335765Z [W1204 11:09:09.589319533 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5335784Z 
2025-12-04T11:11:26.5336281Z [W1204 11:09:09.595198599 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5336285Z 
2025-12-04T11:11:26.5336784Z [W1204 11:09:09.596033506 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5336789Z 
2025-12-04T11:11:26.5337303Z [W1204 11:09:09.596222025 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5337307Z 
2025-12-04T11:11:26.5337434Z ('RERUN', {'yellow': True}) [0.5124s] [100%]
2025-12-04T11:11:26.5338701Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:10.974726090 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5338709Z 
2025-12-04T11:11:26.5339209Z [W1204 11:09:10.975503333 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5339213Z 
2025-12-04T11:11:26.5339727Z [W1204 11:09:10.975699747 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5339732Z 
2025-12-04T11:11:26.5340234Z [W1204 11:09:10.979621523 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5340238Z 
2025-12-04T11:11:26.5340754Z [W1204 11:09:10.980462289 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5340761Z 
2025-12-04T11:11:26.5341261Z [W1204 11:09:10.980655901 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5341268Z 
2025-12-04T11:11:26.5341762Z [W1204 11:09:10.986591408 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5341778Z 
2025-12-04T11:11:26.5342272Z [W1204 11:09:10.987279703 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5342277Z 
2025-12-04T11:11:26.5342841Z [W1204 11:09:10.987466204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5342846Z 
2025-12-04T11:11:26.5343350Z [W1204 11:09:10.075543958 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5343386Z 
2025-12-04T11:11:26.5343884Z [W1204 11:09:10.076313662 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5343888Z 
2025-12-04T11:11:26.5344433Z [W1204 11:09:10.076511495 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5344438Z 
2025-12-04T11:11:26.5344932Z [W1204 11:09:10.080352593 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5344936Z 
2025-12-04T11:11:26.5345445Z [W1204 11:09:10.080978799 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5345449Z 
2025-12-04T11:11:26.5345950Z [W1204 11:09:10.081166826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5345956Z 
2025-12-04T11:11:26.5346464Z [W1204 11:09:10.086946826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5346469Z 
2025-12-04T11:11:26.5346966Z [W1204 11:09:10.087729460 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5346972Z 
2025-12-04T11:11:26.5347466Z [W1204 11:09:10.087916252 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5347471Z 
2025-12-04T11:11:26.5347580Z FAILED [0.4920s] [100%]
2025-12-04T11:11:26.5347585Z 
2025-12-04T11:11:26.5347731Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5348248Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5348373Z Traceback (most recent call last):
2025-12-04T11:11:26.5348880Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5349119Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5349577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5349754Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5350286Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5350488Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5350632Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5350638Z 
2025-12-04T11:11:26.5350745Z Expected 1 but got 2.
2025-12-04T11:11:26.5350855Z Absolute difference: 1
2025-12-04T11:11:26.5350981Z Relative difference: 1.0
2025-12-04T11:11:26.5350988Z 
2025-12-04T11:11:26.5351202Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5352105Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5352112Z 
2025-12-04T11:11:26.5352375Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5352590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5352721Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5353335Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5353575Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5353673Z graph_break []
2025-12-04T11:11:26.5353884Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5355109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5355255Z   if out == self.unknown_value:
2025-12-04T11:11:26.5356030Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5356131Z   warnings.warn(
2025-12-04T11:11:26.5356845Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5356954Z   warnings.warn(
2025-12-04T11:11:26.5357454Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5357574Z Traceback (most recent call last):
2025-12-04T11:11:26.5358082Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5358310Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5358770Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5358933Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5359458Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5359680Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5359812Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5359817Z 
2025-12-04T11:11:26.5359950Z Expected 1 but got 2.
2025-12-04T11:11:26.5360054Z Absolute difference: 1
2025-12-04T11:11:26.5360168Z Relative difference: 1.0
2025-12-04T11:11:26.5360173Z 
2025-12-04T11:11:26.5360396Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5361280Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5361287Z 
2025-12-04T11:11:26.5361628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5361848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5361962Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5362495Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5362718Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5362815Z graph_break []
2025-12-04T11:11:26.5363046Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5364233Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5364364Z   if out == self.unknown_value:
2025-12-04T11:11:26.5365074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5365176Z   warnings.warn(
2025-12-04T11:11:26.5365981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5366079Z   warnings.warn(
2025-12-04T11:11:26.5366304Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5366450Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5366670Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5367200Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5367334Z graph_break []
2025-12-04T11:11:26.5367545Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5368269Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5368366Z   warnings.warn(
2025-12-04T11:11:26.5369082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5369177Z   warnings.warn(
2025-12-04T11:11:26.5369320Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5369834Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5369952Z Traceback (most recent call last):
2025-12-04T11:11:26.5370450Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5370689Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5371138Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5371312Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5371841Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5372042Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5372186Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5372191Z 
2025-12-04T11:11:26.5372293Z Expected 1 but got 2.
2025-12-04T11:11:26.5372409Z Absolute difference: 1
2025-12-04T11:11:26.5372517Z Relative difference: 1.0
2025-12-04T11:11:26.5372524Z 
2025-12-04T11:11:26.5372735Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5373632Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5373637Z 
2025-12-04T11:11:26.5373898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5374127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5374240Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5374760Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5374998Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5375093Z graph_break []
2025-12-04T11:11:26.5375303Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5376495Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5376610Z   if out == self.unknown_value:
2025-12-04T11:11:26.5377410Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5377509Z   warnings.warn(
2025-12-04T11:11:26.5378214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5378357Z   warnings.warn(
2025-12-04T11:11:26.5378570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5378728Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5378954Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5379467Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5379580Z graph_break []
2025-12-04T11:11:26.5379792Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5380502Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5380613Z   warnings.warn(
2025-12-04T11:11:26.5381322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5381433Z   warnings.warn(
2025-12-04T11:11:26.5381643Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5381757Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5381994Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5382509Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5382605Z graph_break []
2025-12-04T11:11:26.5382831Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5383536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5383648Z   warnings.warn(
2025-12-04T11:11:26.5384350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5384447Z   warnings.warn(
2025-12-04T11:11:26.5385280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml -
2025-12-04T11:11:26.5385450Z =========================== short test summary info ============================
2025-12-04T11:11:26.5386386Z FAILED [0.4920s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5386393Z 
2025-12-04T11:11:26.5386495Z Expected 1 but got 2.
2025-12-04T11:11:26.5386597Z Absolute difference: 1
2025-12-04T11:11:26.5386714Z Relative difference: 1.0
2025-12-04T11:11:26.5386721Z 
2025-12-04T11:11:26.5386931Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5387828Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5387835Z 
2025-12-04T11:11:26.5388095Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5388273Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5388481Z ================== 1 failed, 10 deselected, 2 rerun in 20.16s ==================
2025-12-04T11:11:26.5388575Z Got exit code 1
2025-12-04T11:11:26.5388761Z Retrying single test...
2025-12-04T11:11:26.5389203Z W1204 11:09:21.990000 93184 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5389848Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml
2025-12-04T11:11:26.5390056Z ============================= test session starts ==============================
2025-12-04T11:11:26.5390400Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5390538Z cachedir: .pytest_cache
2025-12-04T11:11:26.5391066Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5391188Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5391308Z configfile: pytest.ini
2025-12-04T11:11:26.5391842Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5392058Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5393043Z stepcurrent: skipping 9 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5393157Z Running 1 items in this shard
2025-12-04T11:11:26.5393162Z 
2025-12-04T11:11:26.5394432Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:25.195906126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5394438Z 
2025-12-04T11:11:26.5394948Z [W1204 11:09:40.398115658 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5394953Z 
2025-12-04T11:11:26.5395473Z [W1204 11:09:40.398378255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5395480Z 
2025-12-04T11:11:26.5395981Z [W1204 11:09:40.405598016 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5395986Z 
2025-12-04T11:11:26.5396484Z [W1204 11:09:40.406345618 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5396505Z 
2025-12-04T11:11:26.5397004Z [W1204 11:09:40.406535097 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5397009Z 
2025-12-04T11:11:26.5397505Z [W1204 11:09:40.413450371 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5397513Z 
2025-12-04T11:11:26.5398031Z [W1204 11:09:40.414163491 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5398036Z 
2025-12-04T11:11:26.5398538Z [W1204 11:09:40.414347930 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5398543Z 
2025-12-04T11:11:26.5399055Z [W1204 11:09:42.359772077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5399062Z 
2025-12-04T11:11:26.5399559Z [W1204 11:09:42.361583001 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5399564Z 
2025-12-04T11:11:26.5400078Z [W1204 11:09:42.361804725 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5400083Z 
2025-12-04T11:11:26.5400667Z [W1204 11:09:42.365880068 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5400672Z 
2025-12-04T11:11:26.5401389Z [W1204 11:09:42.366575751 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5401564Z 
2025-12-04T11:11:26.5402070Z [W1204 11:09:42.366770200 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5402124Z 
2025-12-04T11:11:26.5402624Z [W1204 11:09:42.372863303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5402643Z 
2025-12-04T11:11:26.5403143Z [W1204 11:09:42.373555028 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5403148Z 
2025-12-04T11:11:26.5403654Z [W1204 11:09:42.373746518 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5403659Z 
2025-12-04T11:11:26.5403805Z ('RERUN', {'yellow': True}) [19.0303s] [100%]
2025-12-04T11:11:26.5405065Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:43.814019647 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5405072Z 
2025-12-04T11:11:26.5405595Z [W1204 11:09:43.814802242 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5405599Z 
2025-12-04T11:11:26.5406102Z [W1204 11:09:43.815000342 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5406106Z 
2025-12-04T11:11:26.5406623Z [W1204 11:09:43.818921691 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5406628Z 
2025-12-04T11:11:26.5407125Z [W1204 11:09:43.819733326 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5407131Z 
2025-12-04T11:11:26.5407632Z [W1204 11:09:43.819931608 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5407656Z 
2025-12-04T11:11:26.5408153Z [W1204 11:09:43.825921727 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5408158Z 
2025-12-04T11:11:26.5408662Z [W1204 11:09:43.826581911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5408666Z 
2025-12-04T11:11:26.5409180Z [W1204 11:09:43.826768185 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5409185Z 
2025-12-04T11:11:26.5409685Z [W1204 11:09:43.916986640 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5409692Z 
2025-12-04T11:11:26.5410205Z [W1204 11:09:43.917791597 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5410212Z 
2025-12-04T11:11:26.5410713Z [W1204 11:09:43.917997570 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5410718Z 
2025-12-04T11:11:26.5411226Z [W1204 11:09:43.921991175 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5411231Z 
2025-12-04T11:11:26.5411818Z [W1204 11:09:43.922648392 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5411823Z 
2025-12-04T11:11:26.5412334Z [W1204 11:09:43.922840997 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5412367Z 
2025-12-04T11:11:26.5412864Z [W1204 11:09:43.928804275 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5412868Z 
2025-12-04T11:11:26.5413398Z [W1204 11:09:43.929616898 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5413415Z 
2025-12-04T11:11:26.5413911Z [W1204 11:09:43.929809630 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5413916Z 
2025-12-04T11:11:26.5414078Z ('RERUN', {'yellow': True}) [0.5175s] [100%]
2025-12-04T11:11:26.5415342Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 [W1204 11:09:43.306775501 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5415349Z 
2025-12-04T11:11:26.5415855Z [W1204 11:09:43.307541547 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5415861Z 
2025-12-04T11:11:26.5416371Z [W1204 11:09:43.307739054 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5416376Z 
2025-12-04T11:11:26.5416873Z [W1204 11:09:43.311682413 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5416878Z 
2025-12-04T11:11:26.5417393Z [W1204 11:09:43.312488741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5417397Z 
2025-12-04T11:11:26.5417894Z [W1204 11:09:43.312676250 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5417900Z 
2025-12-04T11:11:26.5418406Z [W1204 11:09:43.318675309 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5418411Z 
2025-12-04T11:11:26.5418909Z [W1204 11:09:43.319320204 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5418913Z 
2025-12-04T11:11:26.5419410Z [W1204 11:09:43.319505826 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5419415Z 
2025-12-04T11:11:26.5419931Z [W1204 11:09:43.408899566 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5419935Z 
2025-12-04T11:11:26.5420431Z [W1204 11:09:43.409718524 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5420438Z 
2025-12-04T11:11:26.5420945Z [W1204 11:09:43.409927851 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5420950Z 
2025-12-04T11:11:26.5421448Z [W1204 11:09:43.413952563 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5421454Z 
2025-12-04T11:11:26.5421965Z [W1204 11:09:43.414652950 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5421969Z 
2025-12-04T11:11:26.5422532Z [W1204 11:09:43.414850077 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5422537Z 
2025-12-04T11:11:26.5423054Z [W1204 11:09:43.420933415 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5423092Z 
2025-12-04T11:11:26.5423589Z [W1204 11:09:43.421808858 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5423594Z 
2025-12-04T11:11:26.5424089Z [W1204 11:09:43.422005226 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5424137Z 
2025-12-04T11:11:26.5424239Z FAILED [0.4931s] [100%]
2025-12-04T11:11:26.5424244Z 
2025-12-04T11:11:26.5424386Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5424899Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5425025Z Traceback (most recent call last):
2025-12-04T11:11:26.5425528Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5425767Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5426224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5426396Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5426923Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5427125Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5427267Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5427271Z 
2025-12-04T11:11:26.5427375Z Expected 1 but got 2.
2025-12-04T11:11:26.5427490Z Absolute difference: 1
2025-12-04T11:11:26.5427598Z Relative difference: 1.0
2025-12-04T11:11:26.5427606Z 
2025-12-04T11:11:26.5427816Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5428723Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5428731Z 
2025-12-04T11:11:26.5428990Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5429220Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5429333Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5429855Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5430086Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5430182Z graph_break []
2025-12-04T11:11:26.5430399Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5431599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5431714Z   if out == self.unknown_value:
2025-12-04T11:11:26.5432439Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5432540Z   warnings.warn(
2025-12-04T11:11:26.5433243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5433353Z   warnings.warn(
2025-12-04T11:11:26.5433916Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5434051Z Traceback (most recent call last):
2025-12-04T11:11:26.5434545Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5434802Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5435263Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5435424Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5435996Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5436214Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5436344Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5436349Z 
2025-12-04T11:11:26.5436470Z Expected 1 but got 2.
2025-12-04T11:11:26.5436578Z Absolute difference: 1
2025-12-04T11:11:26.5436690Z Relative difference: 1.0
2025-12-04T11:11:26.5436695Z 
2025-12-04T11:11:26.5436917Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5437807Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5437814Z 
2025-12-04T11:11:26.5438090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5438305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5438419Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5438952Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5439175Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5439271Z graph_break []
2025-12-04T11:11:26.5439502Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5440678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5440805Z   if out == self.unknown_value:
2025-12-04T11:11:26.5441608Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5441713Z   warnings.warn(
2025-12-04T11:11:26.5442433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5442531Z   warnings.warn(
2025-12-04T11:11:26.5442762Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5442875Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5443097Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5443631Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5443728Z graph_break []
2025-12-04T11:11:26.5443939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5444666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5444762Z   warnings.warn(
2025-12-04T11:11:26.5445478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5445574Z   warnings.warn(
2025-12-04T11:11:26.5445794Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5446311Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 _
2025-12-04T11:11:26.5446459Z Traceback (most recent call last):
2025-12-04T11:11:26.5446970Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5447196Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5447673Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5447848Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5448370Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5448575Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5448714Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5448720Z 
2025-12-04T11:11:26.5448822Z Expected 1 but got 2.
2025-12-04T11:11:26.5448937Z Absolute difference: 1
2025-12-04T11:11:26.5449044Z Relative difference: 1.0
2025-12-04T11:11:26.5449050Z 
2025-12-04T11:11:26.5449259Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5450156Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5450164Z 
2025-12-04T11:11:26.5450426Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5450653Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5450763Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5451283Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5451517Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5451613Z graph_break []
2025-12-04T11:11:26.5451830Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5453028Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5453150Z   if out == self.unknown_value:
2025-12-04T11:11:26.5453879Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5453977Z   warnings.warn(
2025-12-04T11:11:26.5454688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5454801Z   warnings.warn(
2025-12-04T11:11:26.5455014Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5455142Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5455364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5455879Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5455991Z graph_break []
2025-12-04T11:11:26.5456202Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5456922Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5457022Z   warnings.warn(
2025-12-04T11:11:26.5457787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5457904Z   warnings.warn(
2025-12-04T11:11:26.5458144Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5458253Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5458485Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5458998Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2)]
2025-12-04T11:11:26.5459136Z graph_break []
2025-12-04T11:11:26.5459346Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5460053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5460168Z   warnings.warn(
2025-12-04T11:11:26.5460874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5460976Z   warnings.warn(
2025-12-04T11:11:26.5461819Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml -
2025-12-04T11:11:26.5461987Z =========================== short test summary info ============================
2025-12-04T11:11:26.5462926Z FAILED [0.4931s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5462932Z 
2025-12-04T11:11:26.5463039Z Expected 1 but got 2.
2025-12-04T11:11:26.5463144Z Absolute difference: 1
2025-12-04T11:11:26.5463274Z Relative difference: 1.0
2025-12-04T11:11:26.5463279Z 
2025-12-04T11:11:26.5463493Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5464402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5464409Z 
2025-12-04T11:11:26.5464675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5464855Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5465065Z ================== 1 failed, 10 deselected, 2 rerun in 20.07s ==================
2025-12-04T11:11:26.5465168Z Got exit code 1
2025-12-04T11:11:26.5465988Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16
2025-12-04T11:11:26.5466393Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.5466836Z W1204 11:09:55.242000 93366 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5467495Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml
2025-12-04T11:11:26.5467659Z ============================= test session starts ==============================
2025-12-04T11:11:26.5468020Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5468130Z cachedir: .pytest_cache
2025-12-04T11:11:26.5468640Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5468809Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5468915Z configfile: pytest.ini
2025-12-04T11:11:26.5469512Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5469746Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5469915Z stepcurrent: skipping 10 already run items.
2025-12-04T11:11:26.5470043Z Running 1 items in this shard
2025-12-04T11:11:26.5470048Z 
2025-12-04T11:11:26.5470892Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [4.0462s] [100%]
2025-12-04T11:11:26.5471852Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 ('RERUN', {'yellow': True}) [0.4497s] [100%]
2025-12-04T11:11:26.5472628Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 FAILED [0.4494s] [100%]
2025-12-04T11:11:26.5472634Z 
2025-12-04T11:11:26.5472773Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5473275Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5473398Z Traceback (most recent call last):
2025-12-04T11:11:26.5473913Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5474144Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5474592Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5474766Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5475295Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5475498Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5475644Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5475649Z 
2025-12-04T11:11:26.5475756Z Expected 1 but got 2.
2025-12-04T11:11:26.5475873Z Absolute difference: 1
2025-12-04T11:11:26.5475977Z Relative difference: 1.0
2025-12-04T11:11:26.5475982Z 
2025-12-04T11:11:26.5476193Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5477082Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5477089Z 
2025-12-04T11:11:26.5477351Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5477576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5477691Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5478558Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5478793Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5478888Z graph_break []
2025-12-04T11:11:26.5479098Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5479833Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5479931Z   warnings.warn(
2025-12-04T11:11:26.5480650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5480748Z   warnings.warn(
2025-12-04T11:11:26.5481302Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5481435Z Traceback (most recent call last):
2025-12-04T11:11:26.5482008Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5482323Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5482775Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5482969Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5483509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5483710Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5483836Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5483855Z 
2025-12-04T11:11:26.5483963Z Expected 1 but got 2.
2025-12-04T11:11:26.5484066Z Absolute difference: 1
2025-12-04T11:11:26.5484191Z Relative difference: 1.0
2025-12-04T11:11:26.5484196Z 
2025-12-04T11:11:26.5484407Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5485287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5485294Z 
2025-12-04T11:11:26.5485569Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5485783Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5485909Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5486781Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5487003Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5487113Z graph_break []
2025-12-04T11:11:26.5487325Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5488054Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5488153Z   warnings.warn(
2025-12-04T11:11:26.5488857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5488965Z   warnings.warn(
2025-12-04T11:11:26.5489179Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5489288Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5489528Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5490400Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5490511Z graph_break []
2025-12-04T11:11:26.5490721Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5491428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5491540Z   warnings.warn(
2025-12-04T11:11:26.5492246Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5492354Z   warnings.warn(
2025-12-04T11:11:26.5492554Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5493051Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5493185Z Traceback (most recent call last):
2025-12-04T11:11:26.5493719Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5493947Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5494406Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5494597Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5495133Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5495334Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5495463Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5495472Z 
2025-12-04T11:11:26.5495589Z Expected 1 but got 2.
2025-12-04T11:11:26.5495693Z Absolute difference: 1
2025-12-04T11:11:26.5495799Z Relative difference: 1.0
2025-12-04T11:11:26.5495816Z 
2025-12-04T11:11:26.5496028Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5496906Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5496914Z 
2025-12-04T11:11:26.5497189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5497403Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5497515Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5498395Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5498616Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5498725Z graph_break []
2025-12-04T11:11:26.5498940Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5499655Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5499766Z   warnings.warn(
2025-12-04T11:11:26.5500473Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5500585Z   warnings.warn(
2025-12-04T11:11:26.5500798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5501157Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5501399Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5502268Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5502381Z graph_break []
2025-12-04T11:11:26.5502592Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5503301Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5503416Z   warnings.warn(
2025-12-04T11:11:26.5504115Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5504211Z   warnings.warn(
2025-12-04T11:11:26.5504590Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5504704Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5504944Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5505851Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5505948Z graph_break []
2025-12-04T11:11:26.5506226Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5506935Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5507045Z   warnings.warn(
2025-12-04T11:11:26.5507756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5507851Z   warnings.warn(
2025-12-04T11:11:26.5508681Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml -
2025-12-04T11:11:26.5508852Z =========================== short test summary info ============================
2025-12-04T11:11:26.5509761Z FAILED [0.4494s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5509781Z 
2025-12-04T11:11:26.5509883Z Expected 1 but got 2.
2025-12-04T11:11:26.5509985Z Absolute difference: 1
2025-12-04T11:11:26.5510102Z Relative difference: 1.0
2025-12-04T11:11:26.5510107Z 
2025-12-04T11:11:26.5510318Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5511202Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5511219Z 
2025-12-04T11:11:26.5511484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5511662Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5511868Z ================== 1 failed, 10 deselected, 2 rerun in 4.98s ===================
2025-12-04T11:11:26.5511965Z Got exit code 1
2025-12-04T11:11:26.5512070Z Retrying single test...
2025-12-04T11:11:26.5512523Z W1204 11:10:15.372000 93562 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5513173Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml
2025-12-04T11:11:26.5513352Z ============================= test session starts ==============================
2025-12-04T11:11:26.5513695Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5513802Z cachedir: .pytest_cache
2025-12-04T11:11:26.5514329Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5514450Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5514554Z configfile: pytest.ini
2025-12-04T11:11:26.5515097Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5515322Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5516300Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5516472Z Running 1 items in this shard
2025-12-04T11:11:26.5516477Z 
2025-12-04T11:11:26.5517720Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:21.669252805 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5517770Z 
2025-12-04T11:11:26.5518283Z [W1204 11:10:36.580901945 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5518319Z 
2025-12-04T11:11:26.5518825Z [W1204 11:10:36.581156043 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5518843Z 
2025-12-04T11:11:26.5519348Z [W1204 11:10:36.588359578 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5519353Z 
2025-12-04T11:11:26.5519858Z [W1204 11:10:36.589085609 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5519863Z 
2025-12-04T11:11:26.5520378Z [W1204 11:10:36.589325368 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5520383Z 
2025-12-04T11:11:26.5520883Z [W1204 11:10:36.596324120 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5520891Z 
2025-12-04T11:11:26.5521404Z [W1204 11:10:36.596982002 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5521409Z 
2025-12-04T11:11:26.5521973Z [W1204 11:10:36.597164018 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5521979Z 
2025-12-04T11:11:26.5522495Z [W1204 11:10:37.733489534 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5522499Z 
2025-12-04T11:11:26.5522997Z [W1204 11:10:37.735277807 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5523004Z 
2025-12-04T11:11:26.5523515Z [W1204 11:10:37.735484020 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5523522Z 
2025-12-04T11:11:26.5524018Z [W1204 11:10:37.739483708 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5524022Z 
2025-12-04T11:11:26.5524519Z [W1204 11:10:37.740280866 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5524524Z 
2025-12-04T11:11:26.5525040Z [W1204 11:10:37.740501741 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5525044Z 
2025-12-04T11:11:26.5525542Z [W1204 11:10:37.746585813 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5525549Z 
2025-12-04T11:11:26.5526063Z [W1204 11:10:37.747284312 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5526069Z 
2025-12-04T11:11:26.5526563Z [W1204 11:10:37.747481151 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5526568Z 
2025-12-04T11:11:26.5526712Z ('RERUN', {'yellow': True}) [20.0187s] [100%]
2025-12-04T11:11:26.5528035Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:37.157497058 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5528041Z 
2025-12-04T11:11:26.5528556Z [W1204 11:10:37.158311954 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5528591Z 
2025-12-04T11:11:26.5529093Z [W1204 11:10:37.158510427 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5529128Z 
2025-12-04T11:11:26.5529629Z [W1204 11:10:37.162653752 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5529649Z 
2025-12-04T11:11:26.5530146Z [W1204 11:10:37.163322911 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5530151Z 
2025-12-04T11:11:26.5530657Z [W1204 11:10:37.163513297 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5530663Z 
2025-12-04T11:11:26.5531177Z [W1204 11:10:37.169600099 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5531183Z 
2025-12-04T11:11:26.5531680Z [W1204 11:10:37.170349961 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5531685Z 
2025-12-04T11:11:26.5532200Z [W1204 11:10:37.170542399 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5532205Z 
2025-12-04T11:11:26.5532699Z [W1204 11:10:37.260154908 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5532705Z 
2025-12-04T11:11:26.5533222Z [W1204 11:10:37.260951843 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5533227Z 
2025-12-04T11:11:26.5533724Z [W1204 11:10:37.261158272 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5533731Z 
2025-12-04T11:11:26.5534246Z [W1204 11:10:37.265119827 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5534251Z 
2025-12-04T11:11:26.5534749Z [W1204 11:10:37.265778382 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5534757Z 
2025-12-04T11:11:26.5535254Z [W1204 11:10:37.265969013 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5535272Z 
2025-12-04T11:11:26.5535776Z [W1204 11:10:37.271996753 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5535781Z 
2025-12-04T11:11:26.5536280Z [W1204 11:10:37.272845146 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5536287Z 
2025-12-04T11:11:26.5536802Z [W1204 11:10:37.273037451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5536807Z 
2025-12-04T11:11:26.5536938Z ('RERUN', {'yellow': True}) [0.4856s] [100%]
2025-12-04T11:11:26.5538194Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:37.616179972 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5538199Z 
2025-12-04T11:11:26.5538758Z [W1204 11:10:37.616938580 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5538763Z 
2025-12-04T11:11:26.5539280Z [W1204 11:10:37.617136425 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5539312Z 
2025-12-04T11:11:26.5539928Z [W1204 11:10:38.621229329 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5539933Z 
2025-12-04T11:11:26.5540433Z [W1204 11:10:38.621876317 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5540489Z 
2025-12-04T11:11:26.5541067Z [W1204 11:10:38.622064129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5541072Z 
2025-12-04T11:11:26.5541656Z [W1204 11:10:38.628196395 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5541661Z 
2025-12-04T11:11:26.5542178Z [W1204 11:10:38.628875263 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5542185Z 
2025-12-04T11:11:26.5542684Z [W1204 11:10:38.629062907 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5542689Z 
2025-12-04T11:11:26.5543203Z [W1204 11:10:38.718976126 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5543210Z 
2025-12-04T11:11:26.5543708Z [W1204 11:10:38.719755108 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5543713Z 
2025-12-04T11:11:26.5544227Z [W1204 11:10:38.719958715 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5544235Z 
2025-12-04T11:11:26.5544732Z [W1204 11:10:38.723872454 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5544737Z 
2025-12-04T11:11:26.5545254Z [W1204 11:10:38.724502026 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5545259Z 
2025-12-04T11:11:26.5545756Z [W1204 11:10:38.724691670 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5545763Z 
2025-12-04T11:11:26.5546260Z [W1204 11:10:38.730638384 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5546279Z 
2025-12-04T11:11:26.5546778Z [W1204 11:10:38.731426634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5546783Z 
2025-12-04T11:11:26.5547280Z [W1204 11:10:38.731616363 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5547285Z 
2025-12-04T11:11:26.5547395Z FAILED [0.4556s] [100%]
2025-12-04T11:11:26.5547402Z 
2025-12-04T11:11:26.5547545Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5548053Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5548179Z Traceback (most recent call last):
2025-12-04T11:11:26.5548679Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5548920Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5549374Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5549616Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5550158Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5550358Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5550530Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5550536Z 
2025-12-04T11:11:26.5550638Z Expected 1 but got 2.
2025-12-04T11:11:26.5550740Z Absolute difference: 1
2025-12-04T11:11:26.5550862Z Relative difference: 1.0
2025-12-04T11:11:26.5550899Z 
2025-12-04T11:11:26.5551108Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5552006Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5552012Z 
2025-12-04T11:11:26.5552279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5552496Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5552621Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5553489Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5553726Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5553823Z graph_break []
2025-12-04T11:11:26.5554036Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5555224Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5555340Z   if out == self.unknown_value:
2025-12-04T11:11:26.5556065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5556170Z   warnings.warn(
2025-12-04T11:11:26.5556999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5557112Z   warnings.warn(
2025-12-04T11:11:26.5557608Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5557728Z Traceback (most recent call last):
2025-12-04T11:11:26.5558238Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5558466Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5558933Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5559093Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5559617Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5559837Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5559966Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5559972Z 
2025-12-04T11:11:26.5560092Z Expected 1 but got 2.
2025-12-04T11:11:26.5560195Z Absolute difference: 1
2025-12-04T11:11:26.5560304Z Relative difference: 1.0
2025-12-04T11:11:26.5560308Z 
2025-12-04T11:11:26.5560531Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5561548Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5561556Z 
2025-12-04T11:11:26.5561826Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5562056Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5562202Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5563084Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5563339Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5563434Z graph_break []
2025-12-04T11:11:26.5563663Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5564889Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5565018Z   if out == self.unknown_value:
2025-12-04T11:11:26.5565727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5565826Z   warnings.warn(
2025-12-04T11:11:26.5566541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5566641Z   warnings.warn(
2025-12-04T11:11:26.5566866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5566976Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5567200Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5568085Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5568181Z graph_break []
2025-12-04T11:11:26.5568391Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5569112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5569211Z   warnings.warn(
2025-12-04T11:11:26.5569932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5570027Z   warnings.warn(
2025-12-04T11:11:26.5570168Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5570682Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5570801Z Traceback (most recent call last):
2025-12-04T11:11:26.5571311Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5571543Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5571992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5572166Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5572694Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5572896Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5573039Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5573044Z 
2025-12-04T11:11:26.5573148Z Expected 1 but got 2.
2025-12-04T11:11:26.5573266Z Absolute difference: 1
2025-12-04T11:11:26.5573470Z Relative difference: 1.0
2025-12-04T11:11:26.5573475Z 
2025-12-04T11:11:26.5573685Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5574583Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5574619Z 
2025-12-04T11:11:26.5574881Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5575140Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5575253Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5576126Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5576364Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5576460Z graph_break []
2025-12-04T11:11:26.5576671Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5577865Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5577982Z   if out == self.unknown_value:
2025-12-04T11:11:26.5578707Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5578804Z   warnings.warn(
2025-12-04T11:11:26.5579508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5579616Z   warnings.warn(
2025-12-04T11:11:26.5579832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5579956Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5580180Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5581046Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5581154Z graph_break []
2025-12-04T11:11:26.5581365Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5582085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5582181Z   warnings.warn(
2025-12-04T11:11:26.5582888Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5582996Z   warnings.warn(
2025-12-04T11:11:26.5583207Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5583320Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5583555Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5584423Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5584530Z graph_break []
2025-12-04T11:11:26.5584738Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5585447Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5585631Z   warnings.warn(
2025-12-04T11:11:26.5586334Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5586471Z   warnings.warn(
2025-12-04T11:11:26.5587296Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml -
2025-12-04T11:11:26.5587464Z =========================== short test summary info ============================
2025-12-04T11:11:26.5588422Z FAILED [0.4556s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5588427Z 
2025-12-04T11:11:26.5588529Z Expected 1 but got 2.
2025-12-04T11:11:26.5588648Z Absolute difference: 1
2025-12-04T11:11:26.5588756Z Relative difference: 1.0
2025-12-04T11:11:26.5588766Z 
2025-12-04T11:11:26.5588981Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5589879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5589887Z 
2025-12-04T11:11:26.5590149Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5590343Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5590540Z ================== 1 failed, 10 deselected, 2 rerun in 20.99s ==================
2025-12-04T11:11:26.5590638Z Got exit code 1
2025-12-04T11:11:26.5590758Z Retrying single test...
2025-12-04T11:11:26.5591198Z W1204 11:10:49.546000 93763 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5591847Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml
2025-12-04T11:11:26.5592024Z ============================= test session starts ==============================
2025-12-04T11:11:26.5592372Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5592493Z cachedir: .pytest_cache
2025-12-04T11:11:26.5593002Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5593126Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5593244Z configfile: pytest.ini
2025-12-04T11:11:26.5593774Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5593987Z collecting ... collected 58 items / 10 deselected / 48 selected
2025-12-04T11:11:26.5594964Z stepcurrent: skipping 10 already run items. Running only test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5595076Z Running 1 items in this shard
2025-12-04T11:11:26.5595083Z 
2025-12-04T11:11:26.5596335Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:10:55.821215965 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5596343Z 
2025-12-04T11:11:26.5596850Z [W1204 11:11:10.292030303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5596855Z 
2025-12-04T11:11:26.5597368Z [W1204 11:11:10.292286372 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5597374Z 
2025-12-04T11:11:26.5597933Z [W1204 11:11:10.299611925 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5597938Z 
2025-12-04T11:11:26.5598450Z [W1204 11:11:10.300372440 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5598489Z 
2025-12-04T11:11:26.5598991Z [W1204 11:11:10.300569683 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5599025Z 
2025-12-04T11:11:26.5599539Z [W1204 11:11:10.307707967 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5599543Z 
2025-12-04T11:11:26.5600038Z [W1204 11:11:10.308398037 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5600043Z 
2025-12-04T11:11:26.5600543Z [W1204 11:11:10.308584615 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5600574Z 
2025-12-04T11:11:26.5601286Z [W1204 11:11:10.443276451 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5601294Z 
2025-12-04T11:11:26.5601851Z [W1204 11:11:10.445016863 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5601859Z 
2025-12-04T11:11:26.5602377Z [W1204 11:11:10.445223560 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5602382Z 
2025-12-04T11:11:26.5602877Z [W1204 11:11:10.449091345 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5602882Z 
2025-12-04T11:11:26.5603398Z [W1204 11:11:10.449742303 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5603403Z 
2025-12-04T11:11:26.5603897Z [W1204 11:11:10.449936459 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5603903Z 
2025-12-04T11:11:26.5604412Z [W1204 11:11:10.455890000 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5604419Z 
2025-12-04T11:11:26.5604918Z [W1204 11:11:10.456532181 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5604923Z 
2025-12-04T11:11:26.5605421Z [W1204 11:11:10.456720159 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5605440Z 
2025-12-04T11:11:26.5605570Z ('RERUN', {'yellow': True}) [19.5562s] [100%]
2025-12-04T11:11:26.5606822Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:11:11.860333884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5606830Z 
2025-12-04T11:11:26.5607343Z [W1204 11:11:11.861078100 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5607350Z 
2025-12-04T11:11:26.5607851Z [W1204 11:11:11.861273815 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5607855Z 
2025-12-04T11:11:26.5608372Z [W1204 11:11:11.865252509 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5608376Z 
2025-12-04T11:11:26.5609020Z [W1204 11:11:11.865889876 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5609026Z 
2025-12-04T11:11:26.5609539Z [W1204 11:11:11.866076822 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5609589Z 
2025-12-04T11:11:26.5610090Z [W1204 11:11:11.872191154 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5610094Z 
2025-12-04T11:11:26.5610662Z [W1204 11:11:11.872829745 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5610667Z 
2025-12-04T11:11:26.5611165Z [W1204 11:11:11.873013405 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5611170Z 
2025-12-04T11:11:26.5611671Z [W1204 11:11:11.961162884 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5611690Z 
2025-12-04T11:11:26.5612188Z [W1204 11:11:11.961921465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5612195Z 
2025-12-04T11:11:26.5612691Z [W1204 11:11:11.962124034 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5612696Z 
2025-12-04T11:11:26.5613212Z [W1204 11:11:11.965964733 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5613218Z 
2025-12-04T11:11:26.5613720Z [W1204 11:11:11.966605255 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5613724Z 
2025-12-04T11:11:26.5614240Z [W1204 11:11:11.966794595 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5614246Z 
2025-12-04T11:11:26.5614743Z [W1204 11:11:11.972726591 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5614750Z 
2025-12-04T11:11:26.5615261Z [W1204 11:11:11.973523700 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5615266Z 
2025-12-04T11:11:26.5615762Z [W1204 11:11:11.973713672 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5615769Z 
2025-12-04T11:11:26.5615913Z ('RERUN', {'yellow': True}) [0.4788s] [100%]
2025-12-04T11:11:26.5617156Z inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 [W1204 11:11:11.313517402 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5617162Z 
2025-12-04T11:11:26.5617660Z [W1204 11:11:11.314256682 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5617667Z 
2025-12-04T11:11:26.5618180Z [W1204 11:11:11.314454074 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5618185Z 
2025-12-04T11:11:26.5618684Z [W1204 11:11:11.318404189 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5618691Z 
2025-12-04T11:11:26.5619198Z [W1204 11:11:11.319037525 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5619202Z 
2025-12-04T11:11:26.5619770Z [W1204 11:11:11.319225803 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5619775Z 
2025-12-04T11:11:26.5620294Z [W1204 11:11:11.325315639 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5620331Z 
2025-12-04T11:11:26.5620829Z [W1204 11:11:11.325963406 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5620834Z 
2025-12-04T11:11:26.5621344Z [W1204 11:11:11.326160273 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5621381Z 
2025-12-04T11:11:26.5621877Z [W1204 11:11:11.412730046 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5621882Z 
2025-12-04T11:11:26.5622381Z [W1204 11:11:11.413521404 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5622402Z 
2025-12-04T11:11:26.5622897Z [W1204 11:11:11.413727614 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5622901Z 
2025-12-04T11:11:26.5623400Z [W1204 11:11:11.417674313 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5623405Z 
2025-12-04T11:11:26.5623911Z [W1204 11:11:11.418344760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5623918Z 
2025-12-04T11:11:26.5624417Z [W1204 11:11:11.418539760 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5624421Z 
2025-12-04T11:11:26.5624931Z [W1204 11:11:11.424531634 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5624936Z 
2025-12-04T11:11:26.5625438Z [W1204 11:11:11.425383528 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5625442Z 
2025-12-04T11:11:26.5625950Z [W1204 11:11:11.425575465 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:11:26.5625957Z 
2025-12-04T11:11:26.5626053Z FAILED [0.4510s] [100%]
2025-12-04T11:11:26.5626058Z 
2025-12-04T11:11:26.5626197Z ==================================== RERUNS ====================================
2025-12-04T11:11:26.5626714Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5626834Z Traceback (most recent call last):
2025-12-04T11:11:26.5627352Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5627579Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5628036Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5628212Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5628742Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5628961Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5629093Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5629100Z 
2025-12-04T11:11:26.5629202Z Expected 1 but got 2.
2025-12-04T11:11:26.5629323Z Absolute difference: 1
2025-12-04T11:11:26.5629433Z Relative difference: 1.0
2025-12-04T11:11:26.5629438Z 
2025-12-04T11:11:26.5629650Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5630621Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5630627Z 
2025-12-04T11:11:26.5630892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5631150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5631262Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5632135Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5632407Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5632506Z graph_break []
2025-12-04T11:11:26.5632735Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5633925Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5634039Z   if out == self.unknown_value:
2025-12-04T11:11:26.5634767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5634867Z   warnings.warn(
2025-12-04T11:11:26.5635582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5635680Z   warnings.warn(
2025-12-04T11:11:26.5636172Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5636300Z Traceback (most recent call last):
2025-12-04T11:11:26.5636800Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5637025Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5637486Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5637647Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5638181Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5638387Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5638517Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5638522Z 
2025-12-04T11:11:26.5638636Z Expected 1 but got 2.
2025-12-04T11:11:26.5638739Z Absolute difference: 1
2025-12-04T11:11:26.5638854Z Relative difference: 1.0
2025-12-04T11:11:26.5638859Z 
2025-12-04T11:11:26.5639069Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5639956Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5639961Z 
2025-12-04T11:11:26.5640239Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5640457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5640585Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5641514Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5641745Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5641861Z graph_break []
2025-12-04T11:11:26.5642075Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5643350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5643494Z   if out == self.unknown_value:
2025-12-04T11:11:26.5644205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5644353Z   warnings.warn(
2025-12-04T11:11:26.5645058Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5645154Z   warnings.warn(
2025-12-04T11:11:26.5645384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5645495Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5645734Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5646602Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5646697Z graph_break []
2025-12-04T11:11:26.5646921Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5647629Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5647742Z   warnings.warn(
2025-12-04T11:11:26.5648442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5648537Z   warnings.warn(
2025-12-04T11:11:26.5648693Z =================================== FAILURES ===================================
2025-12-04T11:11:26.5649185Z _ TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 _
2025-12-04T11:11:26.5649305Z Traceback (most recent call last):
2025-12-04T11:11:26.5649813Z   File "/var/lib/jenkins/workspace/test/inductor/test_cuda_select_algorithm.py", line 130, in test_int8_woq_mm_cuda
2025-12-04T11:11:26.5650039Z     self.assertEqual(counters["inductor"]["woq_matcher_count"], 1)
2025-12-04T11:11:26.5650503Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/test_case.py", line 113, in assertEqual
2025-12-04T11:11:26.5650663Z     return super().assertEqual(x, y, *args, **kwargs)
2025-12-04T11:11:26.5651189Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4284, in assertEqual
2025-12-04T11:11:26.5651402Z     raise error_metas.pop()[0].to_error(  # type: ignore[index]
2025-12-04T11:11:26.5651535Z AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5651541Z 
2025-12-04T11:11:26.5651654Z Expected 1 but got 2.
2025-12-04T11:11:26.5651758Z Absolute difference: 1
2025-12-04T11:11:26.5651863Z Relative difference: 1.0
2025-12-04T11:11:26.5651870Z 
2025-12-04T11:11:26.5652093Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5652972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5652980Z 
2025-12-04T11:11:26.5653244Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5653468Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5653579Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5654521Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5654745Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5654959Z graph_break []
2025-12-04T11:11:26.5655184Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5656359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/constant_folding.py:256: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:11:26.5656520Z   if out == self.unknown_value:
2025-12-04T11:11:26.5657227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5657323Z   warnings.warn(
2025-12-04T11:11:26.5658048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5658144Z   warnings.warn(
2025-12-04T11:11:26.5658369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5658482Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5658704Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5659632Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5659731Z graph_break []
2025-12-04T11:11:26.5659943Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5660665Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5660762Z   warnings.warn(
2025-12-04T11:11:26.5661479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5661577Z   warnings.warn(
2025-12-04T11:11:26.5661787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:11:26.5661913Z stats [('calls_captured', 6)]
2025-12-04T11:11:26.5662136Z aot_autograd [('total', 2), ('autograd_cache_bypass', 2), ('not_ok', 2)]
2025-12-04T11:11:26.5663016Z inductor [('pattern_matcher_nodes', 16), ('woq_matcher_nodes', 12), ('pattern_matcher_count', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 2), ('fxgraph_cache_miss', 2), ('woq_matcher_count', 2), ('pad_mm_bench', 1)]
2025-12-04T11:11:26.5663110Z graph_break []
2025-12-04T11:11:26.5663320Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:11:26.5664049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5664150Z   warnings.warn(
2025-12-04T11:11:26.5664852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:11:26.5664960Z   warnings.warn(
2025-12-04T11:11:26.5665776Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml -
2025-12-04T11:11:26.5665963Z =========================== short test summary info ============================
2025-12-04T11:11:26.5666939Z FAILED [0.4510s] inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16 - AssertionError: Scalars are not equal!
2025-12-04T11:11:26.5666946Z 
2025-12-04T11:11:26.5667061Z Expected 1 but got 2.
2025-12-04T11:11:26.5667166Z Absolute difference: 1
2025-12-04T11:11:26.5667273Z Relative difference: 1.0
2025-12-04T11:11:26.5667278Z 
2025-12-04T11:11:26.5667540Z To execute this test, run the following from the base repo dir:
2025-12-04T11:11:26.5668421Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5668458Z 
2025-12-04T11:11:26.5668723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:11:26.5668913Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:11:26.5669107Z ================== 1 failed, 10 deselected, 2 rerun in 20.52s ==================
2025-12-04T11:11:26.5669213Z Got exit code 1
2025-12-04T11:11:26.5670023Z FAILED CONSISTENTLY: test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16
2025-12-04T11:11:26.5670427Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:11:26.5670879Z W1204 11:11:23.200000 93964 site-packages/torch/_inductor/utils.py:1703] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:11:26.5671526Z Test results will be stored in test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml
2025-12-04T11:11:26.5671705Z ============================= test session starts ==============================
2025-12-04T11:11:26.5672048Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:11:26.5672154Z cachedir: .pytest_cache
2025-12-04T11:11:26.5672677Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:11:26.5672798Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:11:26.5672903Z configfile: pytest.ini
2025-12-04T11:11:26.5673447Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, typeguard-4.3.0
2025-12-04T11:11:26.5673664Z collecting ... collected 58 items / 11 deselected / 47 selected
2025-12-04T11:11:26.5673817Z stepcurrent: skipping 11 already run items.
2025-12-04T11:11:26.5673928Z Running 0 items in this shard
2025-12-04T11:11:26.5673935Z 
2025-12-04T11:11:26.5674768Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml -
2025-12-04T11:11:26.5674947Z ============================ 11 deselected in 0.02s ============================
2025-12-04T11:11:26.5683547Z The following tests failed consistently: ['test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_17_mid_dim_8_in_features_144_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_1_in_features_144_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_1_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_1024_out_features_65_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_1_in_features_128_out_features_64_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_1024_cuda_bfloat16', 'test/inductor/test_cuda_select_algorithm.py::TestSelectAlgorithmCudaCUDA::test_int8_woq_mm_cuda_batch_size_32_mid_dim_8_in_features_128_out_features_65_cuda_bfloat16']
2025-12-04T11:11:26.5683613Z 
2025-12-04T11:11:26.5684251Z FINISHED PRINTING LOG FILE of inductor/test_cuda_select_algorithm 4/5 (test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log)
2025-12-04T11:11:26.5684257Z 
2025-12-04T11:11:26.5684658Z Finished inductor/test_cuda_select_algorithm 4/5 ... [2025-12-04 11:11:26.269401][7043.879300637], took 16.03min
2025-12-04T11:11:26.5685540Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml
2025-12-04T11:11:26.5686434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml
2025-12-04T11:11:26.5687309Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml
2025-12-04T11:11:26.5688193Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml
2025-12-04T11:11:26.5689067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml
2025-12-04T11:11:26.5689986Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml
2025-12-04T11:11:26.5690854Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml
2025-12-04T11:11:26.5960380Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml
2025-12-04T11:11:26.6301239Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml
2025-12-04T11:11:26.6615752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml
2025-12-04T11:11:26.6966662Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml
2025-12-04T11:11:26.7282693Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml
2025-12-04T11:11:26.7578162Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml
2025-12-04T11:11:26.7929962Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml
2025-12-04T11:11:26.8239319Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml
2025-12-04T11:11:26.8560350Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml
2025-12-04T11:11:26.8876953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml
2025-12-04T11:11:26.9201296Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml
2025-12-04T11:11:26.9678268Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml
2025-12-04T11:11:26.9992832Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml
2025-12-04T11:11:27.0329314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml
2025-12-04T11:11:27.0596937Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml
2025-12-04T11:11:27.0915220Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml
2025-12-04T11:11:27.1221571Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml
2025-12-04T11:11:27.2064848Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml
2025-12-04T11:11:27.2377141Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml
2025-12-04T11:11:27.2699223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml
2025-12-04T11:11:27.3020490Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml
2025-12-04T11:11:27.3308733Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml
2025-12-04T11:11:27.3601387Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml
2025-12-04T11:11:27.3924223Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml
2025-12-04T11:11:27.4231702Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml
2025-12-04T11:11:27.4530622Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml
2025-12-04T11:11:27.4836938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml
2025-12-04T11:11:27.9082924Z Uploading logs for 57119749427 to S3
2025-12-04T11:11:28.0854722Z Uploading artifacts took 0.58 seconds
2025-12-04T11:11:28.0855137Z inductor/test_cuda_select_algorithm 4/5 failed!
2025-12-04T11:11:28.0859539Z Running inductor/test_deterministic 1/8 ... [2025-12-04 11:11:28.085779][7045.695685551]
2025-12-04T11:11:28.0860118Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:11:28.0864878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=1', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:11:28.086241]
2025-12-04T11:11:37.8151842Z 
2025-12-04T11:11:37.8152811Z inductor/test_deterministic 1/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_1.8_262bcacfdd50a1f9_.log
2025-12-04T11:11:37.8155983Z Running 3 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_BertForMaskedLM_training_or_inference_inference_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_amp, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_GoogleFnet_training_or_inference_training_precision_float16
2025-12-04T11:11:37.8158430Z 
2025-12-04T11:11:37.8158802Z Finished inductor/test_deterministic 1/8 ... [2025-12-04 11:11:37.814980][7055.424886886], took 0.16min
2025-12-04T11:11:37.8235930Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.xml
2025-12-04T11:11:37.8992281Z Running inductor/test_deterministic 6/8 ... [2025-12-04 11:11:37.898915][7055.508822753]
2025-12-04T11:11:37.8992897Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:11:37.8995813Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_deterministic.py', '--shard-id=6', '--num-shards=8', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:11:37.899342]
2025-12-04T11:13:00.5350614Z 
2025-12-04T11:13:00.5351893Z inductor/test_deterministic 6/8 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_deterministic_6.8_b1bfd086dab71470_.log
2025-12-04T11:13:00.5354579Z Running 2 items in this shard: test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_inference_precision_float16, test/inductor/test_deterministic.py::DeterministicTest::test_run2run_determinism_model_name_DistillGPT2_training_or_inference_training_precision_bfloat16
2025-12-04T11:13:00.5356470Z 
2025-12-04T11:13:00.5356878Z Finished inductor/test_deterministic 6/8 ... [2025-12-04 11:13:00.534832][7138.144741414], took 1.38min
2025-12-04T11:13:00.5435879Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.xml
2025-12-04T11:13:00.6122887Z Running inductor/test_extension_backend 1/1 ... [2025-12-04 11:13:00.611950][7138.221857845]
2025-12-04T11:13:00.6123493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:13:00.6126192Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_extension_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:00.612374]
2025-12-04T11:13:16.2995610Z 
2025-12-04T11:13:16.2996778Z inductor/test_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_extension_backend_1.1_057698d7e9793b3b_.log
2025-12-04T11:13:16.2998434Z Running 1 items in this shard: test/inductor/test_extension_backend.py::ExtensionBackendTests::test_open_device_registration
2025-12-04T11:13:16.2999145Z 
2025-12-04T11:13:16.2999621Z Finished inductor/test_extension_backend 1/1 ... [2025-12-04 11:13:16.299304][7153.909213266], took 0.26min
2025-12-04T11:13:16.3080752Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.xml
2025-12-04T11:13:16.3953562Z Running inductor/test_native_matmul 1/2 ... [2025-12-04 11:13:16.395011][7154.004918051]
2025-12-04T11:13:16.3954155Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:13:16.3957073Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_native_matmul.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:13:16.395448]
2025-12-04T11:23:38.4956219Z 
2025-12-04T11:23:38.4957168Z PRINTING LOG FILE of inductor/test_native_matmul 1/2 (test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log)
2025-12-04T11:23:38.4958529Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml
2025-12-04T11:23:38.4959508Z ============================= test session starts ==============================
2025-12-04T11:23:38.4960246Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:23:38.4960967Z cachedir: .pytest_cache
2025-12-04T11:23:38.4961742Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:23:38.4962643Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:23:38.4962990Z configfile: pytest.ini
2025-12-04T11:23:38.4963829Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:23:38.4964752Z collecting ... collected 8 items
2025-12-04T11:23:38.4965217Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:23:38.4968139Z Running 6 items in this shard: test/inductor/test_native_matmul.py::TestTritonDotReduction::test_3mm_add, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_1d_expand, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_2_expand, test/inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_complex
2025-12-04T11:23:38.4970832Z 
2025-12-04T11:23:38.4971279Z inductor/test_native_matmul.py::TestTritonDotReduction::test_3mm_add PASSED [119.1228s] [ 16%]
2025-12-04T11:23:38.4972228Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul PASSED [24.2448s] [ 33%]
2025-12-04T11:23:38.4973675Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:16:24.956000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.4974821Z ('RERUN', {'yellow': True}) [36.4116s] [ 50%]
2025-12-04T11:23:38.4976000Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:17:01.365000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.4977136Z ('RERUN', {'yellow': True}) [36.3815s] [ 50%]
2025-12-04T11:23:38.4978297Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 E1204 11:17:37.776000 95265 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.4979816Z FAILED [36.4097s] [ 50%]
2025-12-04T11:23:38.4980068Z 
2025-12-04T11:23:38.4980212Z ==================================== RERUNS ====================================
2025-12-04T11:23:38.4980837Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.4981504Z Traceback (most recent call last):
2025-12-04T11:23:38.4982170Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.4982923Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.4983692Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.4984438Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.4985081Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.4985738Z     raise self.failureException(msg)
2025-12-04T11:23:38.4986106Z AssertionError: False is not true
2025-12-04T11:23:38.4986410Z 
2025-12-04T11:23:38.4986625Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.4987517Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.4988220Z 
2025-12-04T11:23:38.4988483Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.4989191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.4989723Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.4990105Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.4990766Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.4992476Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.4993956Z graph_break []
2025-12-04T11:23:38.4994372Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.4994960Z Traceback (most recent call last):
2025-12-04T11:23:38.4995610Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.4996286Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.4996895Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.4997557Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.4998115Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.4998772Z     raise self.failureException(msg)
2025-12-04T11:23:38.4999135Z AssertionError: False is not true
2025-12-04T11:23:38.4999355Z 
2025-12-04T11:23:38.4999569Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5000393Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5001210Z 
2025-12-04T11:23:38.5001474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5002158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5002614Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5002987Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5003583Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5005208Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5006839Z graph_break []
2025-12-04T11:23:38.5007216Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5007678Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5008034Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5008668Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5010306Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5011766Z graph_break []
2025-12-04T11:23:38.5012064Z =================================== FAILURES ===================================
2025-12-04T11:23:38.5012595Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5013115Z Traceback (most recent call last):
2025-12-04T11:23:38.5013782Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5014445Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5015064Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5015724Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5016305Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5016875Z     raise self.failureException(msg)
2025-12-04T11:23:38.5017240Z AssertionError: False is not true
2025-12-04T11:23:38.5017464Z 
2025-12-04T11:23:38.5017692Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5018508Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5019215Z 
2025-12-04T11:23:38.5019481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5020101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5020565Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5020930Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5021520Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5023147Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5024572Z graph_break []
2025-12-04T11:23:38.5024922Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5025385Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5025753Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5026328Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5027943Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5029348Z graph_break []
2025-12-04T11:23:38.5029717Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5030167Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5030529Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5031116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5032812Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5034222Z graph_break []
2025-12-04T11:23:38.5035149Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml -
2025-12-04T11:23:38.5036193Z =========================== short test summary info ============================
2025-12-04T11:23:38.5037054Z FAILED [36.4097s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true
2025-12-04T11:23:38.5037698Z 
2025-12-04T11:23:38.5037908Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5038729Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5039352Z 
2025-12-04T11:23:38.5039614Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5040192Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:23:38.5040689Z =============== 1 failed, 2 passed, 2 rerun in 252.60s (0:04:12) ===============
2025-12-04T11:23:38.5041123Z Got exit code 1
2025-12-04T11:23:38.5041384Z Retrying single test...
2025-12-04T11:23:38.5042187Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml
2025-12-04T11:23:38.5043064Z ============================= test session starts ==============================
2025-12-04T11:23:38.5043771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:23:38.5044367Z cachedir: .pytest_cache
2025-12-04T11:23:38.5045055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:23:38.5045918Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:23:38.5046269Z configfile: pytest.ini
2025-12-04T11:23:38.5047023Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:23:38.5047924Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T11:23:38.5048817Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16
2025-12-04T11:23:38.5049624Z Running 1 items in this shard
2025-12-04T11:23:38.5049832Z 
2025-12-04T11:23:38.5050716Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:18:29.785385087 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5051713Z 
2025-12-04T11:23:38.5052149Z E1204 11:18:45.387000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5052858Z ('RERUN', {'yellow': True}) [56.7083s] [100%]
2025-12-04T11:23:38.5053984Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:19:21.109170003 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5054979Z 
2025-12-04T11:23:38.5055424Z E1204 11:19:21.670000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5056104Z ('RERUN', {'yellow': True}) [36.1587s] [100%]
2025-12-04T11:23:38.5057227Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:19:57.218156732 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5058228Z 
2025-12-04T11:23:38.5058661Z E1204 11:19:57.776000 96145 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5059415Z FAILED [36.1040s] [100%]
2025-12-04T11:23:38.5059599Z 
2025-12-04T11:23:38.5059737Z ==================================== RERUNS ====================================
2025-12-04T11:23:38.5060283Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5061682Z Traceback (most recent call last):
2025-12-04T11:23:38.5062344Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5063004Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5063651Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5064311Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5064871Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5065448Z     raise self.failureException(msg)
2025-12-04T11:23:38.5065815Z AssertionError: False is not true
2025-12-04T11:23:38.5066035Z 
2025-12-04T11:23:38.5066262Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5067070Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5067692Z 
2025-12-04T11:23:38.5067951Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5068574Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5069024Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5069395Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5070756Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5072275Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5072811Z graph_break []
2025-12-04T11:23:38.5073185Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5074789Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5076440Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5077100Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5077613Z Traceback (most recent call last):
2025-12-04T11:23:38.5078266Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5078937Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5079535Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5080190Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5080756Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5081318Z     raise self.failureException(msg)
2025-12-04T11:23:38.5081678Z AssertionError: False is not true
2025-12-04T11:23:38.5081896Z 
2025-12-04T11:23:38.5082193Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5083027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5083641Z 
2025-12-04T11:23:38.5083904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5084530Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5084994Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5085352Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5086803Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5088354Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5088897Z graph_break []
2025-12-04T11:23:38.5089254Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5090892Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5092533Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5093164Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5093619Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5093989Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5094578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5096207Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5097716Z graph_break []
2025-12-04T11:23:38.5098016Z =================================== FAILURES ===================================
2025-12-04T11:23:38.5098560Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5099070Z Traceback (most recent call last):
2025-12-04T11:23:38.5099717Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5100388Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5101149Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5101802Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5102373Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5102953Z     raise self.failureException(msg)
2025-12-04T11:23:38.5103320Z AssertionError: False is not true
2025-12-04T11:23:38.5103541Z 
2025-12-04T11:23:38.5103752Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5104574Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5105184Z 
2025-12-04T11:23:38.5105459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5106074Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5106541Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5106917Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5108280Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5109775Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5110310Z graph_break []
2025-12-04T11:23:38.5110677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5112405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5114038Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5114665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5115171Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5115543Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5116122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5117792Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5119201Z graph_break []
2025-12-04T11:23:38.5119570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5120025Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5120403Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5120992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5122669Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5124081Z graph_break []
2025-12-04T11:23:38.5124983Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml -
2025-12-04T11:23:38.5126027Z =========================== short test summary info ============================
2025-12-04T11:23:38.5126854Z FAILED [36.1040s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true
2025-12-04T11:23:38.5127510Z 
2025-12-04T11:23:38.5127718Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5128538Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5129149Z 
2025-12-04T11:23:38.5129422Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5129985Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:23:38.5130511Z ============= 1 failed, 5 deselected, 2 rerun in 129.00s (0:02:09) =============
2025-12-04T11:23:38.5130954Z Got exit code 1
2025-12-04T11:23:38.5131215Z Retrying single test...
2025-12-04T11:23:38.5131960Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml
2025-12-04T11:23:38.5132820Z ============================= test session starts ==============================
2025-12-04T11:23:38.5133461Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:23:38.5134032Z cachedir: .pytest_cache
2025-12-04T11:23:38.5134728Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:23:38.5135493Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:23:38.5135834Z configfile: pytest.ini
2025-12-04T11:23:38.5136577Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:23:38.5137560Z collecting ... collected 8 items / 5 deselected / 3 selected
2025-12-04T11:23:38.5138441Z stepcurrent: skipping 2 already run items. Running only test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16
2025-12-04T11:23:38.5139240Z Running 1 items in this shard
2025-12-04T11:23:38.5139445Z 
2025-12-04T11:23:38.5140410Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:20:48.605987339 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5141440Z 
2025-12-04T11:23:38.5141894Z E1204 11:21:05.123000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5142588Z ('RERUN', {'yellow': True}) [56.5570s] [100%]
2025-12-04T11:23:38.5143715Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:21:41.854814777 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5144751Z 
2025-12-04T11:23:38.5145186Z E1204 11:21:41.416000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5145888Z ('RERUN', {'yellow': True}) [36.1641s] [100%]
2025-12-04T11:23:38.5147001Z inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 [W1204 11:22:17.014751479 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:23:38.5148004Z 
2025-12-04T11:23:38.5148437Z E1204 11:22:17.572000 96746 site-packages/torch/_dynamo/utils.py:3241] Accuracy failed: allclose not within tol=0.0001
2025-12-04T11:23:38.5149107Z FAILED [36.1550s] [100%]
2025-12-04T11:23:38.5149287Z 
2025-12-04T11:23:38.5149438Z ==================================== RERUNS ====================================
2025-12-04T11:23:38.5149966Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5150486Z Traceback (most recent call last):
2025-12-04T11:23:38.5151153Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5151828Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5152431Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5153089Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5153662Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5154229Z     raise self.failureException(msg)
2025-12-04T11:23:38.5154592Z AssertionError: False is not true
2025-12-04T11:23:38.5154813Z 
2025-12-04T11:23:38.5155056Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5155879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5156497Z 
2025-12-04T11:23:38.5156757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5157378Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5157846Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5158201Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5159569Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5161088Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5161641Z graph_break []
2025-12-04T11:23:38.5161997Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5163671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5165323Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5166116Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5166622Z Traceback (most recent call last):
2025-12-04T11:23:38.5167286Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5168003Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5168619Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5169273Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5169881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5170464Z     raise self.failureException(msg)
2025-12-04T11:23:38.5170816Z AssertionError: False is not true
2025-12-04T11:23:38.5171048Z 
2025-12-04T11:23:38.5171256Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5172071Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5172682Z 
2025-12-04T11:23:38.5172954Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5173553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5174022Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5174389Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5175747Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5177246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5177790Z graph_break []
2025-12-04T11:23:38.5178155Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5179760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5181398Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5182025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5182489Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5182846Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5183435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5185067Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5186484Z graph_break []
2025-12-04T11:23:38.5186768Z =================================== FAILURES ===================================
2025-12-04T11:23:38.5187311Z ___________________ TestTritonDotReduction.test_matmul_fp16 ____________________
2025-12-04T11:23:38.5187831Z Traceback (most recent call last):
2025-12-04T11:23:38.5188489Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 96, in test_matmul_fp16
2025-12-04T11:23:38.5189150Z     self._check_equal(f, (x, y))
2025-12-04T11:23:38.5189757Z   File "/var/lib/jenkins/workspace/test/inductor/test_native_matmul.py", line 30, in _check_equal
2025-12-04T11:23:38.5190412Z     self.assertTrue(same(expect, actual))
2025-12-04T11:23:38.5190970Z   File "/opt/conda/envs/py_3.10/lib/python3.10/unittest/case.py", line 687, in assertTrue
2025-12-04T11:23:38.5191546Z     raise self.failureException(msg)
2025-12-04T11:23:38.5191907Z AssertionError: False is not true
2025-12-04T11:23:38.5192131Z 
2025-12-04T11:23:38.5192439Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5193256Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5193911Z 
2025-12-04T11:23:38.5194173Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5194794Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5195246Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5195645Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5197004Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('fxgraph_cache_miss', 1), ('async_compile_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5198714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5199250Z graph_break []
2025-12-04T11:23:38.5199620Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:23:38.5201391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/runtime/static_cuda_launcher.py:155: UserWarning: Unsupported unwinding pattern: Address not in range (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/profiler/unwind/unwind.cpp:219.)
2025-12-04T11:23:38.5203108Z   (self.function, self.n_regs, self.n_spills) = _StaticCudaLauncher._load_kernel(
2025-12-04T11:23:38.5203727Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5204195Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5204568Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5205158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5206775Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5208189Z graph_break []
2025-12-04T11:23:38.5208553Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:23:38.5209014Z frames [('total', 1), ('ok', 1)]
2025-12-04T11:23:38.5209365Z stats [('calls_captured', 2), ('unique_graphs', 1)]
2025-12-04T11:23:38.5209955Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:23:38.5211570Z inductor [('triton_bundler_save_kernel', 152), ('benchmarking.InductorBenchmarker.benchmark', 4), ('benchmarking.InductorBenchmarker.benchmark_gpu', 4), ('async_compile_cache_miss', 2), ('fxgraph_cache_miss', 1), ('async_compile_cache_hit', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:23:38.5212970Z graph_break []
2025-12-04T11:23:38.5213855Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml -
2025-12-04T11:23:38.5214892Z =========================== short test summary info ============================
2025-12-04T11:23:38.5215728Z FAILED [36.1550s] inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16 - AssertionError: False is not true
2025-12-04T11:23:38.5216369Z 
2025-12-04T11:23:38.5216593Z To execute this test, run the following from the base repo dir:
2025-12-04T11:23:38.5217405Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_native_matmul.py TestTritonDotReduction.test_matmul_fp16
2025-12-04T11:23:38.5218025Z 
2025-12-04T11:23:38.5218288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:23:38.5218867Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:23:38.5219524Z ============= 1 failed, 5 deselected, 2 rerun in 128.90s (0:02:08) =============
2025-12-04T11:23:38.5219977Z Got exit code 1
2025-12-04T11:23:38.5220532Z FAILED CONSISTENTLY: test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16
2025-12-04T11:23:38.5221511Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:23:38.5222603Z Test results will be stored in test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml
2025-12-04T11:23:38.5223511Z ============================= test session starts ==============================
2025-12-04T11:23:38.5224154Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:23:38.5224734Z cachedir: .pytest_cache
2025-12-04T11:23:38.5225415Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:23:38.5226182Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:23:38.5226529Z configfile: pytest.ini
2025-12-04T11:23:38.5227270Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:23:38.5228188Z collecting ... collected 8 items / 3 deselected / 5 selected
2025-12-04T11:23:38.5228663Z stepcurrent: skipping 3 already run items.
2025-12-04T11:23:38.5229039Z Running 3 items in this shard
2025-12-04T11:23:38.5229242Z 
2025-12-04T11:23:38.5229639Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_1d_expand PASSED [27.8324s] [ 33%]
2025-12-04T11:23:38.5230557Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_2_expand PASSED [13.6980s] [ 66%]
2025-12-04T11:23:38.5231465Z inductor/test_native_matmul.py::TestTritonDotReduction::test_mm_complex PASSED [26.4024s] [100%]
2025-12-04T11:23:38.5231978Z 
2025-12-04T11:23:38.5232733Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml -
2025-12-04T11:23:38.5233792Z ================== 3 passed, 3 deselected in 67.96s (0:01:07) ==================
2025-12-04T11:23:38.5234636Z The following tests failed consistently: ['test/inductor/test_native_matmul.py::TestTritonDotReduction::test_matmul_fp16']
2025-12-04T11:23:38.5235275Z 
2025-12-04T11:23:38.5235841Z FINISHED PRINTING LOG FILE of inductor/test_native_matmul 1/2 (test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log)
2025-12-04T11:23:38.5236525Z 
2025-12-04T11:23:38.5236892Z Finished inductor/test_native_matmul 1/2 ... [2025-12-04 11:23:38.495378][7776.105283319], took 10.37min
2025-12-04T11:23:38.5238154Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml
2025-12-04T11:23:38.5798255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml
2025-12-04T11:23:38.6118434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml
2025-12-04T11:23:38.6419370Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml
2025-12-04T11:23:39.0400394Z Uploading logs for 57119749427 to S3
2025-12-04T11:23:39.1048343Z Uploading artifacts took 0.42 seconds
2025-12-04T11:23:39.1048727Z inductor/test_native_matmul 1/2 failed!
2025-12-04T11:23:39.1053412Z Running dynamo/test_fx_graph_runnable 1/1 ... [2025-12-04 11:23:39.105178][7776.715086643]
2025-12-04T11:23:39.1054077Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:23:39.1058946Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_graph_runnable.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:23:39.105615]
2025-12-04T11:26:34.4791980Z 
2025-12-04T11:26:34.4793148Z dynamo/test_fx_graph_runnable 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_graph_runnable_1.1_bc88b60e43fe7f12_.log
2025-12-04T11:26:34.4802797Z Running 17 items in this shard: test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_gather_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_all_reduce_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_basic_tensor_add, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_add_dynamic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_broadcast_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dtensor_compile_redistribute, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dynamic_expression, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_dynamic_shapes_run, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_metrics_context, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_reduce_scatter_collective, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_scalar_multiply, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_basic, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_batch_processing, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_toy_model_dynamic_batch, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_two_inputs_matmul, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel, test/dynamo/test_fx_graph_runnable.py::FxGraphRunnableTest::test_user_defined_triton_kernel_autotune
2025-12-04T11:26:34.4811299Z 
2025-12-04T11:26:34.4811672Z Finished dynamo/test_fx_graph_runnable 1/1 ... [2025-12-04 11:26:34.478961][7952.088870292], took 2.92min
2025-12-04T11:26:34.4885625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.xml
2025-12-04T11:26:34.7188956Z Running inductor/test_memory 1/1 ... [2025-12-04 11:26:34.718580][7952.328487815]
2025-12-04T11:26:34.7189499Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:26:34.7192502Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_memory.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:26:34.719015]
2025-12-04T11:27:59.7004184Z 
2025-12-04T11:27:59.7005219Z PRINTING LOG FILE of inductor/test_memory 1/1 (test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log)
2025-12-04T11:27:59.7006522Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml
2025-12-04T11:27:59.7007771Z ============================= test session starts ==============================
2025-12-04T11:27:59.7008898Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:59.7009796Z cachedir: .pytest_cache
2025-12-04T11:27:59.7010767Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:59.7011918Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:27:59.7012446Z configfile: pytest.ini
2025-12-04T11:27:59.7013731Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:27:59.7015126Z collecting ... collected 8 items
2025-12-04T11:27:59.7015696Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:27:59.7021536Z Running 8 items in this shard: test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusing_reductions_increase_peak_memory, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusion_acc_large_reads, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_multiple_mutations_of_buf, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_bfs, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_dfs, test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_lpmf
2025-12-04T11:27:59.7025859Z 
2025-12-04T11:27:59.7026865Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusing_reductions_increase_peak_memory W1204 11:26:47.605000 100522 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:27:59.7028104Z PASSED [5.4475s] [ 12%]
2025-12-04T11:27:59.7028757Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_fusion_acc_large_reads PASSED [1.5106s] [ 25%]
2025-12-04T11:27:59.7029834Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_multiple_mutations_of_buf PASSED [0.6490s] [ 37%]
2025-12-04T11:27:59.7031025Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.6213s] [ 50%]
2025-12-04T11:27:59.7032297Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5562s] [ 50%]
2025-12-04T11:27:59.7033494Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.7637s] [ 50%]
2025-12-04T11:27:59.7034104Z 
2025-12-04T11:27:59.7034256Z ==================================== RERUNS ====================================
2025-12-04T11:27:59.7034868Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.7035434Z Traceback (most recent call last):
2025-12-04T11:27:59.7036141Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.7036921Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.7038524Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7040177Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7040647Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7040994Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7041410Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7042209Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7072598Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7103121Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7121226Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7140585Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7142307Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7143386Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7144324Z   warnings.warn(
2025-12-04T11:27:59.7145197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7146144Z   warnings.warn(
2025-12-04T11:27:59.7147012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7147939Z   warnings.warn(
2025-12-04T11:27:59.7149466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7151479Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7153253Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7154752Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7156610Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7158678Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7160478Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7162005Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7162776Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.7163357Z Traceback (most recent call last):
2025-12-04T11:27:59.7164065Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.7164844Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.7166760Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7168430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7168907Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7169259Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7169829Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7170525Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7201170Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7232381Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7250422Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7270365Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7272175Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7273262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7274216Z   warnings.warn(
2025-12-04T11:27:59.7275080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7276021Z   warnings.warn(
2025-12-04T11:27:59.7277074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7278034Z   warnings.warn(
2025-12-04T11:27:59.7279561Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7281628Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7283491Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7285063Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7286929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7288999Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7290760Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7292250Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7292844Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7293315Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7293642Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7294072Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7294758Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7325518Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7355568Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7373457Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7392678Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7394396Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7395479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7396425Z   warnings.warn(
2025-12-04T11:27:59.7397292Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7398242Z   warnings.warn(
2025-12-04T11:27:59.7399113Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7400041Z   warnings.warn(
2025-12-04T11:27:59.7401828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7403911Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7405731Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7407272Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7409069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7411133Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7412895Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7414383Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7414895Z =================================== FAILURES ===================================
2025-12-04T11:27:59.7415504Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.7416079Z Traceback (most recent call last):
2025-12-04T11:27:59.7416776Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.7417547Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.7419164Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7420812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7421282Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7421618Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7422048Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7422735Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7453165Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7483411Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7501520Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7520842Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7522656Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7523741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7524683Z   warnings.warn(
2025-12-04T11:27:59.7525563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7526510Z   warnings.warn(
2025-12-04T11:27:59.7527364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7528314Z   warnings.warn(
2025-12-04T11:27:59.7529851Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7531861Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7533638Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7535124Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7536932Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7539087Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7540857Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7542390Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7543003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7543473Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7543823Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7544239Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7544932Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7579488Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7610113Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7628077Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7647365Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7649140Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7650222Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7651222Z   warnings.warn(
2025-12-04T11:27:59.7652082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7653057Z   warnings.warn(
2025-12-04T11:27:59.7653918Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7654848Z   warnings.warn(
2025-12-04T11:27:59.7656376Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7658376Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7660152Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7661646Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7663444Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7665500Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7667261Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7668756Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7669350Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7669802Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.7670148Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7670578Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7671261Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7701951Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7732199Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7750025Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7769343Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7771061Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7772123Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7773077Z   warnings.warn(
2025-12-04T11:27:59.7773951Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7774895Z   warnings.warn(
2025-12-04T11:27:59.7775744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7776681Z   warnings.warn(
2025-12-04T11:27:59.7778209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7780217Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7782030Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7783531Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7785383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7787439Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7789231Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7790743Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7791781Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml -
2025-12-04T11:27:59.7792746Z =========================== short test summary info ============================
2025-12-04T11:27:59.7794938Z FAILED [1.7637s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7797086Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:27:59.7797581Z ==================== 1 failed, 3 passed, 2 rerun in 12.58s =====================
2025-12-04T11:27:59.7798008Z Got exit code 1
2025-12-04T11:27:59.7798271Z Retrying single test...
2025-12-04T11:27:59.7798931Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml
2025-12-04T11:27:59.7799720Z ============================= test session starts ==============================
2025-12-04T11:27:59.7800369Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:59.7801105Z cachedir: .pytest_cache
2025-12-04T11:27:59.7801784Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:59.7802621Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:27:59.7802972Z configfile: pytest.ini
2025-12-04T11:27:59.7803719Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:27:59.7804642Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:27:59.7817317Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation
2025-12-04T11:27:59.7818379Z Running 1 items in this shard
2025-12-04T11:27:59.7818593Z 
2025-12-04T11:27:59.7819572Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation W1204 11:27:11.801000 100899 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:27:59.7820747Z ('RERUN', {'yellow': True}) [7.0158s] [100%]
2025-12-04T11:27:59.7821594Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5736s] [100%]
2025-12-04T11:27:59.7822792Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.5633s] [100%]
2025-12-04T11:27:59.7823582Z 
2025-12-04T11:27:59.7823738Z ==================================== RERUNS ====================================
2025-12-04T11:27:59.7824335Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.7824917Z Traceback (most recent call last):
2025-12-04T11:27:59.7825614Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.7826446Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.7828052Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7829772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7830243Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.7830643Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7831188Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7831884Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7862271Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.7892407Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.7910506Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.7929888Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.7931623Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.7932743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7933700Z   warnings.warn(
2025-12-04T11:27:59.7934563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7935564Z   warnings.warn(
2025-12-04T11:27:59.7936437Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.7937410Z   warnings.warn(
2025-12-04T11:27:59.7938919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7940917Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7942689Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7944184Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7945989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.7948036Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.7949809Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.7951308Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.7952008Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.7952578Z Traceback (most recent call last):
2025-12-04T11:27:59.7953275Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.7954054Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.7955674Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.7957303Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.7957773Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.7958117Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.7958666Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.7959390Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.7989740Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8020100Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8038060Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8057454Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8059171Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8060251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8061208Z   warnings.warn(
2025-12-04T11:27:59.8062082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8063017Z   warnings.warn(
2025-12-04T11:27:59.8063882Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8064820Z   warnings.warn(
2025-12-04T11:27:59.8066392Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8068382Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8070197Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8071718Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8073524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8075608Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8077374Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8078847Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8079444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8079911Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.8080243Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8080676Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8081371Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8112084Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8142167Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8160021Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8179245Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8181009Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8182089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8183045Z   warnings.warn(
2025-12-04T11:27:59.8183902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8184851Z   warnings.warn(
2025-12-04T11:27:59.8185710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8186647Z   warnings.warn(
2025-12-04T11:27:59.8188167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8190170Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8191940Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8193447Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8195235Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8197297Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8199065Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8200554Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8201271Z =================================== FAILURES ===================================
2025-12-04T11:27:59.8201871Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.8202676Z Traceback (most recent call last):
2025-12-04T11:27:59.8203382Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.8204163Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.8205822Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.8207465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8207979Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.8208327Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8208867Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8209607Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8239911Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8270146Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8288493Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8307956Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8309682Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8310766Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8311728Z   warnings.warn(
2025-12-04T11:27:59.8312626Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8313569Z   warnings.warn(
2025-12-04T11:27:59.8314428Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8315409Z   warnings.warn(
2025-12-04T11:27:59.8316915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8318977Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8320749Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8322319Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8324120Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8326166Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8327944Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8329441Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8330032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8330490Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.8330836Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8331267Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8331962Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8362293Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8392528Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8410621Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8429897Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8431624Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8432701Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8433641Z   warnings.warn(
2025-12-04T11:27:59.8434509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8435457Z   warnings.warn(
2025-12-04T11:27:59.8436310Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8437253Z   warnings.warn(
2025-12-04T11:27:59.8438785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8440811Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8442642Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8444129Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8445978Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8448045Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8449845Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8451344Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8451972Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8452447Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.8452795Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8453244Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8453941Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8484314Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8514658Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8532697Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8552378Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8554100Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8555218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8556175Z   warnings.warn(
2025-12-04T11:27:59.8557049Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8558011Z   warnings.warn(
2025-12-04T11:27:59.8558874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8559850Z   warnings.warn(
2025-12-04T11:27:59.8561380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8563483Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8565264Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8566765Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8568564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8570632Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8572385Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8573883Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8574925Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml -
2025-12-04T11:27:59.8575901Z =========================== short test summary info ============================
2025-12-04T11:27:59.8578098Z FAILED [1.5633s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.8580235Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:27:59.8580745Z ================== 1 failed, 7 deselected, 2 rerun in 10.18s ===================
2025-12-04T11:27:59.8581180Z Got exit code 1
2025-12-04T11:27:59.8581448Z Retrying single test...
2025-12-04T11:27:59.8582173Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml
2025-12-04T11:27:59.8582972Z ============================= test session starts ==============================
2025-12-04T11:27:59.8583623Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:59.8584198Z cachedir: .pytest_cache
2025-12-04T11:27:59.8584894Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:59.8585716Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:27:59.8586063Z configfile: pytest.ini
2025-12-04T11:27:59.8586807Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:27:59.8587753Z collecting ... collected 8 items / 7 deselected / 1 selected
2025-12-04T11:27:59.8588753Z stepcurrent: skipping 3 already run items. Running only test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation
2025-12-04T11:27:59.8589683Z Running 1 items in this shard
2025-12-04T11:27:59.8589891Z 
2025-12-04T11:27:59.8590825Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation W1204 11:27:32.824000 101187 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:27:59.8592017Z ('RERUN', {'yellow': True}) [7.0124s] [100%]
2025-12-04T11:27:59.8592857Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation ('RERUN', {'yellow': True}) [1.5637s] [100%]
2025-12-04T11:27:59.8594068Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation FAILED [1.5494s] [100%]
2025-12-04T11:27:59.8594685Z 
2025-12-04T11:27:59.8594826Z ==================================== RERUNS ====================================
2025-12-04T11:27:59.8595431Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.8596013Z Traceback (most recent call last):
2025-12-04T11:27:59.8596701Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.8597476Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.8599098Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.8600742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8601416Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.8601770Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8602401Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8603105Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8633506Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8663830Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8681691Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8701239Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8702969Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8704060Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8705003Z   warnings.warn(
2025-12-04T11:27:59.8705872Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8707004Z   warnings.warn(
2025-12-04T11:27:59.8707861Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8708797Z   warnings.warn(
2025-12-04T11:27:59.8710324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8712338Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8714105Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8715596Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8717521Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8719586Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8721396Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8722993Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8723679Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.8724312Z Traceback (most recent call last):
2025-12-04T11:27:59.8725009Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.8725790Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.8727435Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.8729091Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8729560Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.8729905Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8730444Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8731142Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8761451Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8791595Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8809769Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8829152Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8830914Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8831995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8832999Z   warnings.warn(
2025-12-04T11:27:59.8833860Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8834808Z   warnings.warn(
2025-12-04T11:27:59.8835672Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8836612Z   warnings.warn(
2025-12-04T11:27:59.8838119Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8840126Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8841899Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8843498Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8845293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8847344Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8849117Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8850611Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8851208Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8851663Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.8852009Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8852436Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.8853127Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8883468Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.8913704Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.8931881Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.8951455Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.8953195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.8954267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8955221Z   warnings.warn(
2025-12-04T11:27:59.8956091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8957035Z   warnings.warn(
2025-12-04T11:27:59.8957898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.8958836Z   warnings.warn(
2025-12-04T11:27:59.8960424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8962519Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8964343Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8965829Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8967628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.8969766Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.8971538Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.8973032Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.8973543Z =================================== FAILURES ===================================
2025-12-04T11:27:59.8974155Z _______ TestOperatorReorderForPeakMemory.test_mutation_size_propagation ________
2025-12-04T11:27:59.8974852Z Traceback (most recent call last):
2025-12-04T11:27:59.8975537Z   File "/var/lib/jenkins/workspace/test/inductor/test_memory.py", line 317, in test_mutation_size_propagation
2025-12-04T11:27:59.8976316Z     self.assertEqual(buffer_info[pre][0:2], (2048, 2048))
2025-12-04T11:27:59.8977934Z KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.8979587Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.8980042Z frames [('total', 505), ('ok', 489)]
2025-12-04T11:27:59.8980389Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.8980940Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.8981635Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.9012129Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.9042768Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.9060949Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.9080659Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.9082506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.9083602Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9084554Z   warnings.warn(
2025-12-04T11:27:59.9085438Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9086399Z   warnings.warn(
2025-12-04T11:27:59.9087271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9088213Z   warnings.warn(
2025-12-04T11:27:59.9089743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9091768Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9093578Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9095110Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9096914Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9099061Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9100977Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9102489Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9103177Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.9103656Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.9104007Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.9104439Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.9105165Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.9135756Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.9165949Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.9183821Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.9203284Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.9205012Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.9206163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9207119Z   warnings.warn(
2025-12-04T11:27:59.9208000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9208932Z   warnings.warn(
2025-12-04T11:27:59.9209829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9210770Z   warnings.warn(
2025-12-04T11:27:59.9212328Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9214374Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9216138Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9217635Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9219442Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9221515Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9223283Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9224777Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9225370Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:27:59.9225837Z frames [('total', 464), ('ok', 448)]
2025-12-04T11:27:59.9226169Z stats [('calls_captured', 30)]
2025-12-04T11:27:59.9226600Z aot_autograd [('total', 3), ('autograd_cache_miss', 3), ('not_ok', 3)]
2025-12-04T11:27:59.9227292Z inductor [('pattern_matcher_count', 21), ('pattern_matcher_nodes', 21), ('fxgraph_cache_miss', 3)]
2025-12-04T11:27:59.9257740Z graph_break [("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 2), ('Attempted to call function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `BaseBackend.get_arg_specialization` in file `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/compiler.py` should not be traced.\n  Hint: Avoid calling the function `BaseBackend.get_arg_specialization`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `BaseBackend.get_arg_specialization` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: module: triton.backends.compiler, qualname: BaseBackend.get_arg_specialization, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CUDABackend.parse_options` should not be traced.\n  Hint: Avoid calling the function `CUDABackend.parse_options`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CUDABackend.parse_options` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CUDABackend.parse_options, name: parse_options, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/compiler.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: _hashlib, qualname: openssl_sha256, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ('Attempted to call function marked as skipped\n  Explanation: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).\n  Hint: If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.\n  Hint: If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.\n\n  Developer debug context: module: <unknown module>, qualname: pybind11_object.__new__, skip reason: <missing reason>\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0007.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `set_loc` of class `builder`\n  Hint: Avoid calling `builder.set_loc` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) set_loc [ConstantVariable(str: '/var/lib/jenkins/workspace/test/inductor/test_memory.py'), ConstantVariable(int: 259), ConstantVariable(int: 0)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `create_module` of class `builder`\n  Hint: Avoid calling `builder.create_module` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedObjectVariable(builder) create_module [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ('Attempted to inline function marked as skipped\n  Explanation: Dynamo developers have intentionally marked that the function `CudaLauncher.__init__` should not be traced.\n  Hint: Avoid calling the function `CudaLauncher.__init__`.\n  Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `CudaLauncher.__init__` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.\n  Hint: Please file an issue to PyTorch.\n\n  Developer debug context: qualname: CudaLauncher.__init__, name: __init__, filename: `/opt/conda/envs/py_3.10/lib/python3.10/site-packages/triton/backends/nvidia/driver.py`, skip reason: skipped according trace_rules.lookup SKIP_DIRS\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0008.html', 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), arg_names)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), arg_names) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1)]
2025-12-04T11:27:59.9288441Z aten_mm_info [('aten.mm_32_32_32', 4)]
2025-12-04T11:27:59.9306530Z unimplemented [('Attempt to trace generator\n  Explanation: Generators cannot be compiled directly with `torch.compile`.\n  Hint: Call a generator from inside of a non-generator Python function and compile that function instead.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n\n  Developer debug context: \n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0003.html', 11), ('Data-dependent branching\n  Explanation: Detected data-dependent branching (e.g. `if my_tensor.sum() > 0:`). Dynamo does not support tracing dynamic control flow.\n  Hint: This graph break is fundamental - it is unlikely that Dynamo will ever be able to trace through your code. Consider finding a workaround.\n  Hint: Use `torch.cond` to express dynamic control flow.\n\n  Developer debug context: attempted to jump with GetAttrVariable(TritonKernelVariable(), debug)\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0170.html', 1), ("Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__rmod__` of class `_TensorMeta`\n  Hint: Avoid calling `_TensorMeta.__rmod__` in your code.\n  Hint: Please report an issue to PyTorch.\n\n  Developer debug context: call_method UserDefinedClassVariable(<class 'torch.Tensor'>) __rmod__ [ConstantVariable(str: 'Unsupported type: %s')] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html", 1), ('Unsupported hasattr call\n  Explanation: Dynamo does not know how to trace the function `GetAttrVariable(TritonKernelVariable(), params)`\n  Hint: Avoid calling `hasattr(GetAttrVariable, __iter__)` in your code.\n  Hint: It may be possible to write Dynamo tracing rules for this code. Please report an issue to PyTorch if you encounter this graph break often and it is causing performance issues.\n\n  Developer debug context: call_obj_hasattr GetAttrVariable(TritonKernelVariable(), params) __iter__\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0150.html', 1), ('Unsupported method call\n  Explanation: Dynamo does not know how to trace method `__next__` of class `list_iterator`\n  Hint: Avoid calling `list_iterator.__next__` in your code.\n  Hint: Please report an issue to PyTorch.\n  Hint: Dynamo does not fully support tracing builtin iterators (e.g. `map`, `zip`, `enumerate`) passed in from uncompiled to compiled regions (e.g. `torch.compile(fn)(enumerate(...))`). This can happen unintentionally if a previous graph break happens with a builtin iterator in the local scope.\n  Hint: List/dict comprehensions in Python <= 3.11 result in implicit function calls, which Dynamo cannot trace as a top level frame. Possible workarounds are (1) use a loop instead of a comprehension, (2) fix any graph breaks in the function above the comprehension, (3) wrap the comprehension in a function, or (4) use Python 3.12+.\n\n  Developer debug context: call_method UserDefinedObjectVariable(list_iterator) __next__ [] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0156.html', 1), ("Builtin `operator.*` comparison with constant `self` failed\n  Explanation: Failed to compare ConstantVariable(str: 'triton.language.extra.libdevice') with GetAttrVariable(ConstDictVariable(), __module__), because GetAttrVariable(ConstDictVariable(), __module__) is not a Python constant or its mutation check fails.\n\n\n  Developer debug context: call_method ConstantVariable(str: 'triton.language.extra.libdevice') __eq__ [GetAttrVariable(ConstDictVariable(), __module__)] {}\n\n For more details about this graph break, please visit: https://meta-pytorch.github.io/compile-graph-break-site/gb/gb0023.html", 1)]
2025-12-04T11:27:59.9325992Z resumes [('torch_dynamo_resume_in_dynamic_func_at_4', 3), ('torch_dynamo_resume_in__pack_args_at_687', 1), ('torch_dynamo_resume_in___init___at_49', 1), ('torch_dynamo_resume_in___init___at_308', 1), ('torch_dynamo_resume_in___init___at_315', 1), ('torch_dynamo_resume_in___init___at_322', 1), ('torch_dynamo_resume_in_launch_metadata_at_490', 1), ('torch_dynamo_resume_in_launch_metadata_at_494', 1)]
2025-12-04T11:27:59.9327731Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T11:27:59.9328806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9329759Z   warnings.warn(
2025-12-04T11:27:59.9330623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9331567Z   warnings.warn(
2025-12-04T11:27:59.9332429Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T11:27:59.9333375Z   warnings.warn(
2025-12-04T11:27:59.9334899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `_hashlib.openssl_sha256.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9336906Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9338682Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9340189Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9342050Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py:1933: UserWarning: Dynamo does not know how to trace the builtin `<unknown module>.pybind11_object.__new__.` This function is either a Python builtin (e.g. _warnings.warn) or a third-party C/C++ Python extension (perhaps created with pybind).
2025-12-04T11:27:59.9344119Z If it is a Python builtin, please file an issue on GitHub so the PyTorch team can add support for it and see the next case for a workaround.
2025-12-04T11:27:59.9345891Z If it is a third-party C/C++ Python extension, please either wrap it into a PyTorch-understood custom operator (see https://pytorch.org/tutorials/advanced/custom_ops_landing_page.html for more details) or, if it is traceable, use `torch.compiler.allow_in_graph`.
2025-12-04T11:27:59.9347428Z   torch._dynamo.utils.warn_once(explanation + "\n" + "\n".join(hints))
2025-12-04T11:27:59.9348468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml -
2025-12-04T11:27:59.9349455Z =========================== short test summary info ============================
2025-12-04T11:27:59.9351643Z FAILED [1.5494s] inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation - KeyError: 'buf0\n\nTo execute this test, run the following from the base repo dir:\n    PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_memory.py TestOperatorReorderForPeakMemory.test_mutation_size_propagation\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0'
2025-12-04T11:27:59.9353848Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:27:59.9354360Z ================== 1 failed, 7 deselected, 2 rerun in 10.15s ===================
2025-12-04T11:27:59.9354795Z Got exit code 1
2025-12-04T11:27:59.9355448Z FAILED CONSISTENTLY: test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation
2025-12-04T11:27:59.9356491Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:27:59.9357525Z Test results will be stored in test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml
2025-12-04T11:27:59.9358305Z ============================= test session starts ==============================
2025-12-04T11:27:59.9358955Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:27:59.9359547Z cachedir: .pytest_cache
2025-12-04T11:27:59.9360243Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:27:59.9360999Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:27:59.9361349Z configfile: pytest.ini
2025-12-04T11:27:59.9362191Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:27:59.9363115Z collecting ... collected 8 items / 4 deselected / 4 selected
2025-12-04T11:27:59.9363578Z stepcurrent: skipping 4 already run items.
2025-12-04T11:27:59.9363956Z Running 4 items in this shard
2025-12-04T11:27:59.9364162Z 
2025-12-04T11:27:59.9365085Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory W1204 11:27:54.628000 101475 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:27:59.9366206Z PASSED [6.2993s] [ 25%]
2025-12-04T11:27:59.9366858Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_bfs PASSED [0.7725s] [ 50%]
2025-12-04T11:27:59.9367936Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_dfs PASSED [0.7749s] [ 75%]
2025-12-04T11:27:59.9369014Z inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_reorder_peak_memory_lpmf PASSED [0.7747s] [100%]
2025-12-04T11:27:59.9369619Z 
2025-12-04T11:27:59.9370282Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml -
2025-12-04T11:27:59.9371265Z ======================= 4 passed, 4 deselected in 8.65s ========================
2025-12-04T11:27:59.9372256Z The following tests failed consistently: ['test/inductor/test_memory.py::TestOperatorReorderForPeakMemory::test_mutation_size_propagation']
2025-12-04T11:27:59.9373006Z 
2025-12-04T11:27:59.9373499Z FINISHED PRINTING LOG FILE of inductor/test_memory 1/1 (test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log)
2025-12-04T11:27:59.9374101Z 
2025-12-04T11:27:59.9374428Z Finished inductor/test_memory 1/1 ... [2025-12-04 11:27:59.702035][8037.311935471], took 1.42min
2025-12-04T11:27:59.9375617Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml
2025-12-04T11:27:59.9377232Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml
2025-12-04T11:27:59.9378829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml
2025-12-04T11:27:59.9380411Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml
2025-12-04T11:28:00.2543104Z Uploading logs for 57119749427 to S3
2025-12-04T11:28:00.3313217Z Uploading artifacts took 0.37 seconds
2025-12-04T11:28:00.3313614Z inductor/test_memory 1/1 failed!
2025-12-04T11:28:00.3317898Z Running dynamo/test_streams 1/1 ... [2025-12-04 11:28:00.331613][8037.941519721]
2025-12-04T11:28:00.3318470Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:28:00.3323409Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_streams.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:00.332078]
2025-12-04T11:28:18.2227873Z 
2025-12-04T11:28:18.2228785Z dynamo/test_streams 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_streams_1.1_834a989fad2ef2e3_.log
2025-12-04T11:28:18.2239113Z Running 28 items in this shard: test/dynamo/test_streams.py::TestStreams::test_current_stream_api, test/dynamo/test_streams.py::TestStreams::test_event_tracing, test/dynamo/test_streams.py::TestStreams::test_event_weakref, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return_different_device, test/dynamo/test_streams.py::TestStreams::test_get_current_stream_return_no_index, test/dynamo/test_streams.py::TestStreams::test_inductor_lowering, test/dynamo/test_streams.py::TestStreams::test_is_marked_side_effectful, test/dynamo/test_streams.py::TestStreams::test_local_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_local_stream_nested_enter_exit, test/dynamo/test_streams.py::TestStreams::test_local_stream_return, test/dynamo/test_streams.py::TestStreams::test_nested_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_nested_stream_enter_exit_graph_break, test/dynamo/test_streams.py::TestStreams::test_new_event_api, test/dynamo/test_streams.py::TestStreams::test_new_stream_api, test/dynamo/test_streams.py::TestStreams::test_record_stream_problem_basic, test/dynamo/test_streams.py::TestStreams::test_record_stream_problem_interleaved, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_fork_join, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_wait_record, test/dynamo/test_streams.py::TestStreams::test_run_opcheck_wait_record_stream, test/dynamo/test_streams.py::TestStreams::test_stream_backward_simple, test/dynamo/test_streams.py::TestStreams::test_stream_backward_sync, test/dynamo/test_streams.py::TestStreams::test_stream_context_graph_break, test/dynamo/test_streams.py::TestStreams::test_stream_enter_exit, test/dynamo/test_streams.py::TestStreams::test_stream_enter_exit_graph_break, test/dynamo/test_streams.py::TestStreams::test_stream_input, test/dynamo/test_streams.py::TestStreams::test_stream_weakref, test/dynamo/test_streams.py::TestStreams::test_stream_with_mutation
2025-12-04T11:28:18.2248896Z 
2025-12-04T11:28:18.2249226Z Finished dynamo/test_streams 1/1 ... [2025-12-04 11:28:18.222576][8055.832485802], took 0.30min
2025-12-04T11:28:18.2326619Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.xml
2025-12-04T11:28:18.3227569Z Running inductor/test_unbacked_symints 1/1 ... [2025-12-04 11:28:18.322409][8055.932315723]
2025-12-04T11:28:18.3228203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:28:18.3230717Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_unbacked_symints.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:28:18.322816]
2025-12-04T11:31:57.3170389Z 
2025-12-04T11:31:57.3171611Z PRINTING LOG FILE of inductor/test_unbacked_symints 1/1 (test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log)
2025-12-04T11:31:57.3173927Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml
2025-12-04T11:31:57.3175262Z ============================= test session starts ==============================
2025-12-04T11:31:57.3176251Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.3177099Z cachedir: .pytest_cache
2025-12-04T11:31:57.3178023Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.3179036Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.3179437Z configfile: pytest.ini
2025-12-04T11:31:57.3180256Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.3181290Z collecting ... collected 32 items
2025-12-04T11:31:57.3181833Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T11:31:57.3208509Z Running 32 items in this shard: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotune_with_unbacked_stride_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotuning_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_broadcast_tensors_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_combo_kernel_size_hint_failure_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_einsum_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_equivalent_backed_unbacked_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_ok_with_runtime_assert_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_issue_143498_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_nonzero_in_inference_mode_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_softmax_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_split_with_sizes_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_to_int_with_unbacked_size_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_grid_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_with_unbacked_symint_fallback_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_linear_layer_norm_input_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_masked_scatter_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_range_tree_divisor_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_repeat_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic2_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_False_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_True_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_vertical_pointwise_reduction_fusion_cuda, test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_view_of_slice_cuda
2025-12-04T11:31:57.3231535Z 
2025-12-04T11:31:57.3232526Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotune_with_unbacked_stride_cuda W1204 11:28:31.577000 102009 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:31:57.3233721Z PASSED [3.7078s] [  3%]
2025-12-04T11:31:57.3234327Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_autotuning_cuda PASSED [0.7196s] [  6%]
2025-12-04T11:31:57.3235352Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_broadcast_tensors_cuda PASSED [1.0656s] [  9%]
2025-12-04T11:31:57.3236463Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_combo_kernel_size_hint_failure_cuda PASSED [0.9586s] [ 12%]
2025-12-04T11:31:57.3237530Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_einsum_cuda PASSED [1.3248s] [ 15%]
2025-12-04T11:31:57.3238556Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_equivalent_backed_unbacked_cuda PASSED [0.8151s] [ 18%]
2025-12-04T11:31:57.3239594Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_cuda PASSED [0.2129s] [ 21%]
2025-12-04T11:31:57.3240647Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_expand_ok_with_runtime_assert_cuda PASSED [0.1673s] [ 25%]
2025-12-04T11:31:57.3241724Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_issue_143498_cuda PASSED [0.8675s] [ 28%]
2025-12-04T11:31:57.3242880Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_False_cuda PASSED [0.2551s] [ 31%]
2025-12-04T11:31:57.3244008Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_addmm_True_cuda PASSED [0.2240s] [ 34%]
2025-12-04T11:31:57.3245119Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_False_cuda PASSED [0.2654s] [ 37%]
2025-12-04T11:31:57.3246215Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_bmm_True_cuda PASSED [0.2278s] [ 40%]
2025-12-04T11:31:57.3247290Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_False_cuda PASSED [0.1822s] [ 43%]
2025-12-04T11:31:57.3248387Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_mm_and_friends_mm_True_cuda PASSED [0.1832s] [ 46%]
2025-12-04T11:31:57.3249490Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_nonzero_in_inference_mode_cuda PASSED [0.1517s] [ 50%]
2025-12-04T11:31:57.3250698Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda ('RERUN', {'yellow': True}) [1.5862s] [ 53%]
2025-12-04T11:31:57.3252043Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda ('RERUN', {'yellow': True}) [1.4845s] [ 53%]
2025-12-04T11:31:57.3253223Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda FAILED [1.4461s] [ 53%]
2025-12-04T11:31:57.3253848Z 
2025-12-04T11:31:57.3253991Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.3254571Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3255149Z Traceback (most recent call last):
2025-12-04T11:31:57.3255895Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3256680Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3257478Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3258195Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3258782Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3259428Z     def fn(x, y):
2025-12-04T11:31:57.3259997Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3260672Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3261348Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3262066Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3262873Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3263742Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3264614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3265467Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3266253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3267078Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3267881Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3268671Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3269459Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3270251Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3270907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3271612Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3272258Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3272909Z     out = model(new_inputs)
2025-12-04T11:31:57.3273550Z   File "/tmp/tmpzk88d_6q/iq/ciqi2pkzc6ppzct2bxn5qysanloemqavdl46uw4qpca7rbcygols.py", line 232, in call
2025-12-04T11:31:57.3274502Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3275113Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3275584Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3275948Z 
2025-12-04T11:31:57.3276160Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3277088Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3277817Z 
2025-12-04T11:31:57.3278079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3278706Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3279256Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3279855Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3281809Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3283586Z graph_break []
2025-12-04T11:31:57.3284019Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3284611Z Traceback (most recent call last):
2025-12-04T11:31:57.3285347Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3286122Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3286888Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3287607Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3288183Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3288787Z     def fn(x, y):
2025-12-04T11:31:57.3289351Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3290020Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3290696Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3291398Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3292212Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3293079Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3293947Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3294784Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3295574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3296394Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3297186Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3297985Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3298774Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3299567Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3300213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3301123Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3301787Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3302429Z     out = model(new_inputs)
2025-12-04T11:31:57.3303065Z   File "/tmp/tmpb3ca8a_1/f2/cf2axkrhxyb3addnvgov27lxtexa37daf537i2bseo4z4pj26rjb.py", line 232, in call
2025-12-04T11:31:57.3304016Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3304628Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3305082Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3305462Z 
2025-12-04T11:31:57.3305670Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3306590Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3307394Z 
2025-12-04T11:31:57.3307672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3308280Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3308800Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3309396Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3311384Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3313122Z graph_break []
2025-12-04T11:31:57.3313501Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3314068Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3314661Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3316556Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3318247Z graph_break []
2025-12-04T11:31:57.3318542Z =================================== FAILURES ===================================
2025-12-04T11:31:57.3319124Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3319654Z Traceback (most recent call last):
2025-12-04T11:31:57.3320392Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3321163Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3321891Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3322899Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3323825Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3324555Z     def fn(x, y):
2025-12-04T11:31:57.3325225Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3326079Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3326809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3327611Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3328600Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3329592Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3330547Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3331577Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3332496Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3333361Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3334342Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3335253Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3336159Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3337097Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3337864Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3338718Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3339569Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3340276Z     out = model(new_inputs)
2025-12-04T11:31:57.3341115Z   File "/tmp/tmp4vd8saqk/ju/cjuv7bhfdqljwtykv6sxmap45z57mfp65htxyudkics6zhsum7hk.py", line 232, in call
2025-12-04T11:31:57.3342222Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3342940Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3343557Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3344029Z 
2025-12-04T11:31:57.3344253Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3364506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3365338Z 
2025-12-04T11:31:57.3365622Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3366240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3366770Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3367373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3369292Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3370992Z graph_break []
2025-12-04T11:31:57.3371366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3371884Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3372481Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3374378Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3376068Z graph_break []
2025-12-04T11:31:57.3376439Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3376955Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3377539Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3379453Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3381143Z graph_break []
2025-12-04T11:31:57.3382084Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml -
2025-12-04T11:31:57.3383147Z =========================== short test summary info ============================
2025-12-04T11:31:57.3384240Z FAILED [1.4461s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3385197Z 
2025-12-04T11:31:57.3385414Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3386338Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3387049Z 
2025-12-04T11:31:57.3387311Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3387931Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.3388436Z ==================== 1 failed, 16 passed, 2 rerun in 15.91s ====================
2025-12-04T11:31:57.3388867Z Got exit code 1
2025-12-04T11:31:57.3389117Z Retrying single test...
2025-12-04T11:31:57.3389929Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml
2025-12-04T11:31:57.3390818Z ============================= test session starts ==============================
2025-12-04T11:31:57.3391485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.3392076Z cachedir: .pytest_cache
2025-12-04T11:31:57.3392777Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.3393542Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.3393875Z configfile: pytest.ini
2025-12-04T11:31:57.3394638Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.3395563Z collecting ... collected 32 items / 31 deselected / 1 selected
2025-12-04T11:31:57.3396574Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3397468Z Running 1 items in this shard
2025-12-04T11:31:57.3397687Z 
2025-12-04T11:31:57.3398622Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:01.422368080 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T11:31:57.3400197Z [W1204 11:29:01.422391520 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.3401038Z 
2025-12-04T11:31:57.3401226Z ('RERUN', {'yellow': True}) [20.3490s] [100%]
2025-12-04T11:31:57.3402524Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:18.573688796 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.3403650Z 
2025-12-04T11:31:57.3403784Z ('RERUN', {'yellow': True}) [1.4459s] [100%]
2025-12-04T11:31:57.3405019Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:20.986784791 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.3406122Z 
2025-12-04T11:31:57.3406236Z FAILED [1.4106s] [100%]
2025-12-04T11:31:57.3406407Z 
2025-12-04T11:31:57.3406545Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.3407122Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3407665Z Traceback (most recent call last):
2025-12-04T11:31:57.3408402Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3409168Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3409920Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3410637Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3411319Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3411911Z     def fn(x, y):
2025-12-04T11:31:57.3412489Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3413155Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3413809Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3414520Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3415387Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3416253Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3417150Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3418009Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3418802Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3419661Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3420472Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3421274Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3422066Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3422855Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3423517Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3424234Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3424883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3425515Z     out = model(new_inputs)
2025-12-04T11:31:57.3426195Z   File "/tmp/tmp0crgso0o/xs/cxsear67vvgewktho4wjienlirzjq7esl7uuxpz24mb2iy7tu5av.py", line 232, in call
2025-12-04T11:31:57.3427162Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3427763Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3428239Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3429351Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.3430321Z C++ CapturedTraceback:
2025-12-04T11:31:57.3431779Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.3433670Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.3434608Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.3435873Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.3438048Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3441128Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3450035Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3460803Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.3465709Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3467816Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3472850Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3478153Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3480004Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.3484751Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.3489060Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.3490551Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3492229Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3498058Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.3503835Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.3504546Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.3505273Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.3505938Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.3506787Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3507594Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3508377Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3509106Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3509954Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3510889Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3511794Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3512699Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3513593Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3514499Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3515314Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3516043Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3516763Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3517602Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3518505Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3519406Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3520296Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3521056Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3521814Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3522709Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3523427Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3524162Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3524998Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3525890Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3526789Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3527696Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3528600Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3529490Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3530255Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3530828Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.3531439Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3532330Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3532993Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.3533347Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.3533968Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3534728Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3535488Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3536499Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3537396Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3538224Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3538908Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3539656Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3540451Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3541137Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3541908Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3542694Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3543376Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3544134Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3544930Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3545597Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3546355Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3547113Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3547858Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3548618Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3549382Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3550139Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3550888Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3551795Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3552695Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3553597Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3554486Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3555388Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3556290Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3557192Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3558115Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3559019Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3559921Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3560702Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3561412Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3562171Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3563169Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3563984Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3564718Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3565476Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3566321Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3567234Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3568162Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3569090Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3569867Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3570648Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3571566Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3572494Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3573414Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3574333Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3575187Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3575989Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3576732Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3577443Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.3578101Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3578879Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3579804Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3580716Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3581640Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3582564Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3583341Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3584100Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3585019Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3586040Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3586968Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3587877Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3588651Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3589463Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3590379Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3591282Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3592227Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3593142Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3594024Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3594819Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3595570Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3596308Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3597154Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3598075Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3598853Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3599628Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3600542Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3601784Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3602814Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3603739Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3604599Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3605400Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3606146Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3606869Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3607727Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3608655Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3609576Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3610490Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3611411Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3612325Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3613104Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3613865Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3614882Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3615808Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3616733Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3617639Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3618555Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3619355Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3620081Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3620868Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3621717Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3622680Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3623587Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3624503Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3625420Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3626337Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3627242Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3628159Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3629083Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3630002Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3630794Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.3631527Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.3632241Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.3632913Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.3633687Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.3634498Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.3635246Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.3635931Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.3636602Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.3637198Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.3637613Z #184 _start from ??:0
2025-12-04T11:31:57.3637906Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.3638151Z 
2025-12-04T11:31:57.3638156Z 
2025-12-04T11:31:57.3638371Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3639304Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3640024Z 
2025-12-04T11:31:57.3640288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3640910Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3641429Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3643254Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3645068Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3645637Z graph_break []
2025-12-04T11:31:57.3646080Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3646620Z Traceback (most recent call last):
2025-12-04T11:31:57.3647379Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3648152Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3648897Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3649652Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3650212Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3650809Z     def fn(x, y):
2025-12-04T11:31:57.3651388Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3652045Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3652718Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3653427Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3654224Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3655087Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3655957Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3656806Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3657597Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3658427Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3659237Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3660041Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3660823Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3661620Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3662277Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3662986Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3663642Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3664281Z     out = model(new_inputs)
2025-12-04T11:31:57.3664936Z   File "/tmp/tmp6k3ykq_m/qz/cqze7pled3tdw4klik77kfgsmkopndkuq3mwgtsdaohpcttno6f5.py", line 232, in call
2025-12-04T11:31:57.3665875Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3666492Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3666960Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3668060Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.3669022Z C++ CapturedTraceback:
2025-12-04T11:31:57.3670537Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.3672409Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.3673374Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.3674613Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.3676819Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3679933Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3688862Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3699561Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.3704708Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3706857Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3711792Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3717125Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3718930Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.3723729Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.3728045Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.3729526Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3731208Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3737081Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.3742553Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.3743271Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.3743988Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.3744649Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.3745418Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3746231Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3746950Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3747678Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3748516Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3749413Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3750314Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3751221Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3752129Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3753017Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3753823Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3754561Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3755283Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3756104Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3757008Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3757911Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3758813Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3759566Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3760321Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3761177Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3761897Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3762701Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3763540Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3764479Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3765366Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3766305Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3767205Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3768110Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3768893Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3769406Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.3770013Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3770906Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3771566Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.3771922Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.3772526Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3773277Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3774036Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3774945Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3775846Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3776628Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3777304Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3778063Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3778855Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3779538Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3780302Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3781100Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3781780Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3782537Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3783329Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3784013Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3784763Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3785530Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3786295Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3787045Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3787850Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3788619Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3789378Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3790270Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3791202Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3792107Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3793019Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3793950Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3794855Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3795785Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3796673Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3797577Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3798482Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3799271Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.3799936Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3800695Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3801798Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3802685Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3803417Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3804145Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3804997Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3805929Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3806836Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3807764Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3808542Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3809299Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3810220Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3811136Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3812052Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3812954Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3813826Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3814622Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3815366Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3816057Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.3816814Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3817595Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3818504Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3819473Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3820394Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3821311Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3822116Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3822891Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3823856Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3824779Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3825682Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3826597Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3827368Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3828148Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3829053Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3829968Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3830887Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3831803Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3832653Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3834167Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3834912Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3835635Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3836485Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3837402Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3838174Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3838937Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3839852Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3840767Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3841686Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3842691Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3843557Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3844362Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3845156Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3845886Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3846736Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3847654Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3848594Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3849518Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3850437Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3851386Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3852144Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.3852948Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3853863Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3854781Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3855684Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3856607Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3857470Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.3858273Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3859003Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3859747Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3860593Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3861505Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3862428Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3863343Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3864263Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3865169Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3866087Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3867014Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3867931Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3868836Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3869638Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.3870365Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.3871075Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.3871752Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.3872517Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.3873365Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.3874106Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.3874802Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.3875468Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.3876063Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.3876507Z #184 _start from ??:0
2025-12-04T11:31:57.3876798Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.3877026Z 
2025-12-04T11:31:57.3877031Z 
2025-12-04T11:31:57.3877255Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.3878209Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.3878933Z 
2025-12-04T11:31:57.3879197Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.3879854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3880371Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3882013Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3883924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3884470Z graph_break []
2025-12-04T11:31:57.3884842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.3885341Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.3885939Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.3887855Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.3889545Z graph_break []
2025-12-04T11:31:57.3889842Z =================================== FAILURES ===================================
2025-12-04T11:31:57.3890408Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.3890952Z Traceback (most recent call last):
2025-12-04T11:31:57.3891694Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.3892461Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.3893205Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.3893927Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3894502Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.3895092Z     def fn(x, y):
2025-12-04T11:31:57.3895665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.3896335Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.3896992Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.3897706Z     return compiled_fn(full_args)
2025-12-04T11:31:57.3898523Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.3899386Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.3900278Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.3901292Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.3902091Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.3902906Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.3903804Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.3904606Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.3905397Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.3906230Z     outs = compiled_fn(args)
2025-12-04T11:31:57.3906890Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.3907649Z     return self.current_callable(inputs)
2025-12-04T11:31:57.3908298Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.3908928Z     out = model(new_inputs)
2025-12-04T11:31:57.3909604Z   File "/tmp/tmpcnm13bt0/d7/cd7ienhq7syisf2qdafw5dp4zbzrps2m3gys7cbv7re7algg3qc3.py", line 232, in call
2025-12-04T11:31:57.3910574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.3911173Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.3911640Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.3912751Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.3913727Z C++ CapturedTraceback:
2025-12-04T11:31:57.3915184Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.3917055Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.3917998Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.3919253Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.3921436Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3924559Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3933427Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3944171Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.3949075Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.3951157Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3956135Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.3961427Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.3963319Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.3968002Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.3972369Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.3973857Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3975541Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.3981325Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.3986748Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.3987468Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.3988200Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.3988848Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.3989618Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3990432Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3991202Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3991922Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.3992769Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.3993681Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3994700Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3995594Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3996532Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.3997435Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.3998232Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.3998995Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.3999725Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4000567Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4001691Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4002690Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4003590Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4004357Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4005108Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4005918Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4006643Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4007352Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4008193Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4009098Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4010000Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4010895Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4011798Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4012708Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4013472Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4013973Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4014580Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4015488Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4016144Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.4016488Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4017090Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4017851Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4018699Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4019606Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4020507Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4021303Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4022017Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4022783Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4023579Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4024294Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4025058Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4025851Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4026573Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4027317Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4028105Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4028784Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4029544Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4030295Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4031065Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4031824Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4032572Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4033332Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4034088Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4034993Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4035885Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4036788Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4037689Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4038588Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4039480Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4040384Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4041282Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4042177Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4043146Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4043943Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4044621Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4045364Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4046276Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4047062Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4047795Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4048504Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4049386Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4050316Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4051245Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4052190Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4052968Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4053777Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4054685Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4055605Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4056522Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4057437Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4058288Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4059089Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4059829Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4060536Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4061197Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4061975Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4062902Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4063840Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4064755Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4065682Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4066468Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4067237Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4068165Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4069088Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4070014Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4070925Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4071705Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4072482Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4073405Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4074356Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4075279Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4076200Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4077066Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4077893Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4078637Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4079379Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4080252Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4081173Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4081980Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4082847Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4083761Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4084680Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4085599Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4086521Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4087377Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4088179Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4088930Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4089667Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4090505Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4091423Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4092343Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4093250Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4094171Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4095090Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4095867Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4096625Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4097544Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4098460Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4099380Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4100283Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4101307Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4102113Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4102947Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4103685Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4104540Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4105464Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4106411Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4107334Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4108249Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4109209Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4110116Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4111084Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4112003Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4112920Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4113709Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.4114443Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.4115155Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.4115840Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.4116599Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.4117408Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.4118153Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.4118840Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.4119510Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.4120105Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.4120533Z #184 _start from ??:0
2025-12-04T11:31:57.4120815Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.4121058Z 
2025-12-04T11:31:57.4121063Z 
2025-12-04T11:31:57.4121275Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.4122208Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4123025Z 
2025-12-04T11:31:57.4123305Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.4123917Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4124438Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4126089Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4127905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4128440Z graph_break []
2025-12-04T11:31:57.4128810Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4129338Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4129976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4131909Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4133602Z graph_break []
2025-12-04T11:31:57.4133971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4134471Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4135097Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4137010Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4138736Z graph_break []
2025-12-04T11:31:57.4139672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml -
2025-12-04T11:31:57.4140740Z =========================== short test summary info ============================
2025-12-04T11:31:57.4141834Z FAILED [1.4106s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.4143478Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.4144455Z C++ CapturedTraceback:
2025-12-04T11:31:57.4145917Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.4147794Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.4148732Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.4149998Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.4152192Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4155269Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4164207Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4174944Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.4179864Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4181949Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4186953Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4192252Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4194082Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.4198765Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.4203497Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.4204981Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4206666Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4212457Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.4217872Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.4218584Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.4219305Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4219952Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.4220715Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4221603Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4222341Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4223053Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4223895Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4224842Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4225748Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4226638Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4227604Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4228504Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4229337Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4230063Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4230782Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4231617Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4232505Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4233405Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4234309Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4235072Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4235823Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4236632Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4237358Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4238093Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4238930Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4239849Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4240754Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4241644Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4242661Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4243575Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4244346Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4244854Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4245471Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4246385Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4247045Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.4247390Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4248057Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4248835Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4249650Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4267315Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4268364Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4269175Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4269970Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4270729Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4271576Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4272253Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4273008Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4273846Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4274522Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4275286Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4276067Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4276750Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4277506Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4278263Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4279021Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4279779Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4280536Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4281281Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4282039Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4283042Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4283949Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4284841Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4285743Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4286650Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4287546Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4288447Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4289365Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4290269Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4291162Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4291953Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4292637Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4293394Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4294298Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4295086Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4295820Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4296556Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4297431Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4298360Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4299311Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4300220Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4301273Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4302139Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4303061Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4303970Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4304891Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4305810Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4306678Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4307460Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4308205Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4308912Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4309575Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4310334Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4311258Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4312185Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4313084Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4314014Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4314793Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4315566Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4316475Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4317391Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4318311Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4319228Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4319982Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4320756Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4321675Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4322739Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4323650Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4324572Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4325477Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4326262Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4327003Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4327785Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4328639Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4329547Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4330353Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4331127Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4332043Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4332952Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4333871Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4334793Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4335654Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4336445Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4337191Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4337927Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4338763Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4339687Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4340602Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4341519Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4342429Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4343352Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4344129Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4344905Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4345818Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4346745Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4347662Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4348582Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4349440Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4350290Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4351040Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4351763Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4352615Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4353605Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4354525Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4355434Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4356399Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4357311Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4358265Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4359173Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4360084Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4361001Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4361806Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.4362616Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.4363337Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.4364025Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.4364780Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.4365604Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.4366368Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.4367065Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.4367727Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.4368327Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.4368755Z #184 _start from ??:0
2025-12-04T11:31:57.4369037Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.4369283Z 
2025-12-04T11:31:57.4369288Z 
2025-12-04T11:31:57.4369503Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.4370447Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4371169Z 
2025-12-04T11:31:57.4371446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.4372025Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.4372543Z ================== 1 failed, 31 deselected, 2 rerun in 23.24s ==================
2025-12-04T11:31:57.4372980Z Got exit code 1
2025-12-04T11:31:57.4373247Z Retrying single test...
2025-12-04T11:31:57.4374028Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml
2025-12-04T11:31:57.4374930Z ============================= test session starts ==============================
2025-12-04T11:31:57.4375592Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.4376168Z cachedir: .pytest_cache
2025-12-04T11:31:57.4376928Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.4377702Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.4378055Z configfile: pytest.ini
2025-12-04T11:31:57.4378811Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.4379741Z collecting ... collected 32 items / 31 deselected / 1 selected
2025-12-04T11:31:57.4380806Z stepcurrent: skipping 16 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4381718Z Running 1 items in this shard
2025-12-04T11:31:57.4381961Z 
2025-12-04T11:31:57.4382903Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:37.237515166 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T11:31:57.4384519Z [W1204 11:29:37.237537088 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.4385172Z 
2025-12-04T11:31:57.4385304Z ('RERUN', {'yellow': True}) [20.4072s] [100%]
2025-12-04T11:31:57.4386539Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:54.428333142 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.4387659Z 
2025-12-04T11:31:57.4387789Z ('RERUN', {'yellow': True}) [1.4503s] [100%]
2025-12-04T11:31:57.4389015Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda [W1204 11:29:56.849370129 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.4390128Z 
2025-12-04T11:31:57.4390225Z FAILED [1.4188s] [100%]
2025-12-04T11:31:57.4390395Z 
2025-12-04T11:31:57.4390546Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.4391114Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.4391663Z Traceback (most recent call last):
2025-12-04T11:31:57.4392400Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.4393179Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.4393911Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.4394636Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4395208Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.4395800Z     def fn(x, y):
2025-12-04T11:31:57.4396378Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.4397047Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4397717Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.4398423Z     return compiled_fn(full_args)
2025-12-04T11:31:57.4399253Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.4400122Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.4401280Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.4402145Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.4403046Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.4403885Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.4404779Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.4405589Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.4406379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.4407189Z     outs = compiled_fn(args)
2025-12-04T11:31:57.4407882Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.4408606Z     return self.current_callable(inputs)
2025-12-04T11:31:57.4409260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.4409939Z     out = model(new_inputs)
2025-12-04T11:31:57.4410610Z   File "/tmp/tmp5a7naebo/b3/cb3ut5djh46v5f4z2ofuyumglu2gofja3wthfdneyepvef4lkznn.py", line 232, in call
2025-12-04T11:31:57.4411577Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.4412240Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.4412704Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.4413819Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.4414804Z C++ CapturedTraceback:
2025-12-04T11:31:57.4416288Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.4418158Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.4419101Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.4420377Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.4422563Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4425661Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4434497Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4445361Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.4450319Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4452428Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4457386Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4462663Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4464491Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.4469343Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.4473700Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.4475229Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4476900Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4482790Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.4488251Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.4488960Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.4489696Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4490362Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.4491114Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4491924Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4492655Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4493379Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4494206Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4495113Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4496068Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4496977Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4497864Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4498771Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4499603Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4500329Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4501205Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4502126Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4503034Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4503967Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4504866Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4505628Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4506390Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4507183Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4507910Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4508641Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4509476Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4510378Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4511281Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4512185Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4513093Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4513989Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4514751Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4515276Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4515870Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4516775Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4517436Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.4517793Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4518384Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4519144Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4519900Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4520786Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4521691Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4522588Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4523278Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4524123Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4524920Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4525601Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4526364Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4527184Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4527866Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4528625Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4529436Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4530113Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4530904Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4531662Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4532404Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4533160Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4533912Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4534676Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4535420Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4536319Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4537292Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4538186Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4539084Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4539992Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4540898Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4541783Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4542686Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4543594Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4544499Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4545282Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4545963Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4546715Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4547572Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4548340Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4549071Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4549802Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4550639Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4551627Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4552553Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4553477Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4554241Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4555058Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4555981Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4556941Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4557850Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4558828Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4559699Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4560503Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4561234Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4561943Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4562713Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4563475Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4564399Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4565324Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4566244Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4567153Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4567927Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4568734Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4569650Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4570557Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4571476Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4572399Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4573160Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4573925Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4574844Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4575770Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4576673Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4577591Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4578461Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4579255Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4580043Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4580780Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4581629Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4582581Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4583341Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4584111Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4585060Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4585966Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4586918Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4587837Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4588701Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4589483Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4590230Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4590968Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4591824Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4592733Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4593655Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4594572Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4595489Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4596391Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4597166Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4597936Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4598839Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4599756Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4600672Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4601848Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4602837Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4603640Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4604388Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4605129Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4605971Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4606891Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4607932Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4608860Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4609762Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4610679Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4611639Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4612543Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4613459Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4614418Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4615220Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.4615977Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.4616689Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.4617374Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.4618146Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.4618946Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.4619698Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.4620405Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.4621063Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.4621660Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.4622095Z #184 _start from ??:0
2025-12-04T11:31:57.4622390Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.4622620Z 
2025-12-04T11:31:57.4622626Z 
2025-12-04T11:31:57.4622839Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.4623777Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4624505Z 
2025-12-04T11:31:57.4624772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.4625398Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4625907Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4627573Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4629405Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4629957Z graph_break []
2025-12-04T11:31:57.4630395Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.4630949Z Traceback (most recent call last):
2025-12-04T11:31:57.4631701Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.4632472Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.4633218Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.4633955Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4634535Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.4635170Z     def fn(x, y):
2025-12-04T11:31:57.4635762Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.4636433Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4637104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.4637820Z     return compiled_fn(full_args)
2025-12-04T11:31:57.4638679Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.4639555Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.4640422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.4641314Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.4642111Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.4643093Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.4643883Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.4644686Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.4645482Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.4646268Z     outs = compiled_fn(args)
2025-12-04T11:31:57.4646926Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.4647642Z     return self.current_callable(inputs)
2025-12-04T11:31:57.4648290Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.4648926Z     out = model(new_inputs)
2025-12-04T11:31:57.4649597Z   File "/tmp/tmpc22tbkmr/7v/c7vp5axqaeorg7ro46hdflae277p4tydujnrbi65og4m7x4bl36l.py", line 232, in call
2025-12-04T11:31:57.4650561Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.4651175Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.4651634Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.4652740Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.4653713Z C++ CapturedTraceback:
2025-12-04T11:31:57.4655181Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.4657051Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.4657989Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.4659250Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.4661432Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4664547Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4673311Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4684159Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.4689071Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4691175Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4696218Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4701752Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4703577Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.4708345Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.4712671Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.4714148Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4715838Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4721619Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.4727215Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.4727940Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.4728671Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4729333Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.4730123Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4730931Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4731659Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4732417Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4733257Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4734197Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4735101Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4735993Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4736902Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4737799Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4738605Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4739321Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4740045Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4740883Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4741778Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4742674Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4743578Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4744341Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4745083Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4745896Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4746628Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4747358Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4748188Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4749094Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4749993Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4750897Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4751785Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4752691Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4753452Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4753956Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4754610Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4755512Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4756169Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.4756516Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4757148Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4757912Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4758661Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4759598Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4760503Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4761335Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4762006Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4762835Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4763641Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4764329Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4765076Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4765870Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4766555Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4767306Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4768108Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4768787Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4769549Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4770298Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4771066Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4771826Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4772590Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4773340Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4774096Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4775003Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4775894Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4776790Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4777702Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4778607Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4779496Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4780405Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4781350Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4782256Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4783148Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4783935Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4784667Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4785424Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4786260Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4787082Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4787814Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4788528Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4789409Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4790331Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4791248Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4792153Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4792927Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4793706Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4794625Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4795530Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4796450Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4797369Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4798233Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4799016Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4799761Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4800467Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4801356Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4802133Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4803148Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4804072Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4804975Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4805898Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4806671Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4807446Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4808351Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4809264Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4810267Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4811184Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4811942Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4812715Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4813682Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4814586Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4815539Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4816456Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4817359Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4818142Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4818879Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4819613Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4820465Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4821369Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4822150Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4822924Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4823845Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4824763Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4825682Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4826600Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4827445Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4828239Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4828976Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4829712Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4830548Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4831469Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4832388Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4833312Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4834216Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4835145Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4835922Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4836702Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4837649Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4838579Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4839496Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4840405Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4841304Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4842102Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4842941Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4843792Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4844646Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4845608Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4846533Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4847441Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4848365Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4849280Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4850198Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4851105Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4852026Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4852943Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4853731Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.4854459Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.4855165Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.4855848Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.4856596Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.4857404Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.4858150Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.4858851Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.4859513Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.4860112Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.4860536Z #184 _start from ??:0
2025-12-04T11:31:57.4860824Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.4861071Z 
2025-12-04T11:31:57.4861076Z 
2025-12-04T11:31:57.4861289Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.4862225Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4862934Z 
2025-12-04T11:31:57.4863216Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.4863440Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4863602Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4865013Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4865320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4865460Z graph_break []
2025-12-04T11:31:57.4865678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4865835Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4866174Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4867637Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4867781Z graph_break []
2025-12-04T11:31:57.4867922Z =================================== FAILURES ===================================
2025-12-04T11:31:57.4868208Z ___________ TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda ___________
2025-12-04T11:31:57.4868339Z Traceback (most recent call last):
2025-12-04T11:31:57.4868844Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 553, in test_sdfpa_unbacked_strides
2025-12-04T11:31:57.4868996Z     torch.compile(fn, fullgraph=True)(x, y)
2025-12-04T11:31:57.4869480Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.4869594Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4869973Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 533, in fn
2025-12-04T11:31:57.4870069Z     def fn(x, y):
2025-12-04T11:31:57.4870484Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.4870606Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.4871067Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.4871199Z     return compiled_fn(full_args)
2025-12-04T11:31:57.4871789Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.4871926Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.4872542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.4872659Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.4873221Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.4873367Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.4873907Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.4874038Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.4874589Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.4874699Z     outs = compiled_fn(args)
2025-12-04T11:31:57.4875160Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.4875291Z     return self.current_callable(inputs)
2025-12-04T11:31:57.4875700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.4875876Z     out = model(new_inputs)
2025-12-04T11:31:57.4876355Z   File "/tmp/tmpj7ii8w8h/fj/cfj7rfg42bnrfqziropycccddw22twn3ztyyugkctad7o4cjkzxo.py", line 232, in call
2025-12-04T11:31:57.4876725Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.4876837Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.4877078Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.4877866Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.4877978Z C++ CapturedTraceback:
2025-12-04T11:31:57.4879314Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.4879900Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.4880229Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.4881043Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.4882294Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4884066Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4891088Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4894556Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.4895978Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4896588Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4900796Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4901706Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.4902677Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.4906313Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.4906950Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.4907716Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4908550Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.4913410Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.4913732Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.4914063Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.4914327Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4914585Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.4914963Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4915261Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4915550Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4915855Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4916255Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4916631Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4917029Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4917394Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4917804Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4918164Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4918474Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4918759Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4919103Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4919516Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4919877Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4920280Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4920668Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4920922Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4921296Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4921619Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4921905Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4922272Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4922767Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4923147Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4923544Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4923904Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4924310Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4924673Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4924940Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4925067Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4925430Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4925837Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4925956Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.4926071Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.4926448Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4926699Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4927075Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4927470Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4927831Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4928129Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4928380Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4928754Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4929036Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4929289Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4929664Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4929949Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4930198Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4931088Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4931380Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4931645Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4932009Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4932291Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4932668Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4932920Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4933332Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4933583Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4933951Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4934394Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4934755Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4935154Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4935531Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4935925Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4936304Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4936699Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4937064Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4937474Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4937837Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4938136Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.4938389Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4938748Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4939102Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4939402Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4939707Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4940007Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4940414Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4940796Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4941203Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4941573Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4941848Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4942217Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4942637Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4943040Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4943444Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4943826Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4944209Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4944526Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4944815Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4945112Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.4945381Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4945748Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4946179Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4946556Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4946957Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4947337Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4947597Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4947962Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4948378Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4948745Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4949159Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4949528Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4949785Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4950165Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4950567Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4950947Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4951351Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4951723Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4952087Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4952390Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4952678Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4952996Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4953402Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4953783Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4954045Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4954413Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4954863Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4955232Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4955646Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4956043Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4956392Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4956707Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4957030Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4957341Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4957746Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4958141Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4958554Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4958922Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4959326Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4959707Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4959970Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.4960351Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4960753Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4961121Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4961536Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4961904Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4962264Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.4962656Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.4962953Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.4963267Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.4963672Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.4964059Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4964464Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4964831Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4965251Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4965619Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4966024Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4966411Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4966853Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.4967240Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.4967524Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.4967825Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.4968130Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.4968414Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.4968768Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.4969144Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.4969424Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.4969734Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.4969998Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.4970189Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.4970304Z #184 _start from ??:0
2025-12-04T11:31:57.4970424Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.4970430Z 
2025-12-04T11:31:57.4970435Z 
2025-12-04T11:31:57.4970665Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.4971253Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.4971262Z 
2025-12-04T11:31:57.4971523Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.4971754Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4971917Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4973287Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4973590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4973687Z graph_break []
2025-12-04T11:31:57.4973916Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4974074Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4974383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4975853Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4975950Z graph_break []
2025-12-04T11:31:57.4976173Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.4976329Z stats [('calls_captured', 29), ('unique_graphs', 1)]
2025-12-04T11:31:57.4976643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.4978100Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 13), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.4978235Z graph_break []
2025-12-04T11:31:57.4979022Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml -
2025-12-04T11:31:57.4979190Z =========================== short test summary info ============================
2025-12-04T11:31:57.4980002Z FAILED [1.4188s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.4980738Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.4980876Z C++ CapturedTraceback:
2025-12-04T11:31:57.4982156Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.4982661Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.4982999Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.4983790Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.4985065Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4986741Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4993725Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.4997198Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.4998638Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.4999233Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5003804Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5004555Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5005519Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5009173Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5009799Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5010586Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5011407Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5016284Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5016596Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5016929Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5017194Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5017462Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5017828Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5018126Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5018424Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5018721Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5019134Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5019499Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5019896Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5020269Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5020662Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5021018Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5021331Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5021652Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5021963Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5022357Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5022717Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5023169Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5023531Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5023797Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5024190Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5024484Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5024811Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5025124Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5025534Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5025895Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5026293Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5026666Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5027060Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5027423Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5027692Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5027816Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5028190Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5028584Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5028703Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5028835Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5029197Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5029454Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5029831Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5030227Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5030605Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5030893Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5031146Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5031521Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5031808Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5032070Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5032431Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5032714Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5032982Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5033383Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5033667Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5033930Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5034292Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5034587Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5034954Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5035230Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5035609Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5035860Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5036262Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5036659Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5037023Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5037437Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5037794Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5038203Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5038568Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5038966Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5039341Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5039733Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5040090Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5040388Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5040639Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5041013Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5041350Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5041648Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5041951Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5042242Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5042744Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5043115Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5043519Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5043901Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5044159Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5044541Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5044985Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5045358Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5045772Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5046139Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5046515Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5046832Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5047156Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5047434Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5047693Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5048091Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5048506Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5048874Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5049292Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5049658Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5049913Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5050292Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5050695Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5051075Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5051480Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5051844Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5052116Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5052484Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5052885Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5053265Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5053670Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5054052Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5054400Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5054702Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5055006Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5055309Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5055721Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5056089Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5056345Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5056760Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5057166Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5057534Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5057946Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5058358Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5058718Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5059051Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5059341Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5059657Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5060091Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5060473Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5060875Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5061243Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5061661Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5062032Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5062305Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5062676Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5063077Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5063461Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5063861Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5064227Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5064584Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5064887Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5065191Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5065491Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5065898Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5066278Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5066679Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5067058Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5067458Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5067821Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5068236Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5068646Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5069062Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5069429Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5069711Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5070051Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5070317Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5070594Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5070979Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5071296Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5071592Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5071890Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5072152Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5072358Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5072457Z #184 _start from ??:0
2025-12-04T11:31:57.5072573Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5072596Z 
2025-12-04T11:31:57.5072601Z 
2025-12-04T11:31:57.5072814Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5073402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.5073411Z 
2025-12-04T11:31:57.5073687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5073867Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.5074075Z ================== 1 failed, 31 deselected, 2 rerun in 23.31s ==================
2025-12-04T11:31:57.5074172Z Got exit code 1
2025-12-04T11:31:57.5074686Z FAILED CONSISTENTLY: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda
2025-12-04T11:31:57.5075102Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:31:57.5075701Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml
2025-12-04T11:31:57.5075862Z ============================= test session starts ==============================
2025-12-04T11:31:57.5076218Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.5076325Z cachedir: .pytest_cache
2025-12-04T11:31:57.5076847Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.5076972Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.5077075Z configfile: pytest.ini
2025-12-04T11:31:57.5077667Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.5077885Z collecting ... collected 32 items / 17 deselected / 15 selected
2025-12-04T11:31:57.5078024Z stepcurrent: skipping 17 already run items.
2025-12-04T11:31:57.5078153Z Running 15 items in this shard
2025-12-04T11:31:57.5078158Z 
2025-12-04T11:31:57.5078650Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda ('RERUN', {'yellow': True}) [4.5453s] [  6%]
2025-12-04T11:31:57.5079150Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda ('RERUN', {'yellow': True}) [1.5071s] [  6%]
2025-12-04T11:31:57.5079579Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda FAILED [1.3130s] [  6%]
2025-12-04T11:31:57.5079587Z 
2025-12-04T11:31:57.5079727Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.5079997Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5080114Z Traceback (most recent call last):
2025-12-04T11:31:57.5080539Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5080694Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5081174Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5081295Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5081760Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5081855Z     def fn(x):
2025-12-04T11:31:57.5082291Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5082526Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5083000Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5096109Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5096828Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5096995Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5097603Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5097726Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5098303Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5098435Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5099002Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5099120Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5099665Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5099793Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5100249Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5100390Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5100783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5101100Z     out = model(new_inputs)
2025-12-04T11:31:57.5101594Z   File "/tmp/tmpbxclaczo/tb/ctbqebvmruj4nkytdlerbrxeyr4bumhrd3m254oilw7ylx6twan5.py", line 227, in call
2025-12-04T11:31:57.5101955Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5102071Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5102324Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5102330Z 
2025-12-04T11:31:57.5102540Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5103063Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5103069Z 
2025-12-04T11:31:57.5103331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5103552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5103731Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5105220Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5105545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5105642Z graph_break []
2025-12-04T11:31:57.5105897Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5106090Z Traceback (most recent call last):
2025-12-04T11:31:57.5106504Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5106630Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5107168Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5107275Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5107653Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5107792Z     def fn(x):
2025-12-04T11:31:57.5108207Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5108328Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5108783Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5108898Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5109500Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5109637Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5110244Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5110357Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5110913Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5111060Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5111598Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5111713Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5112273Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5112383Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5112844Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5112972Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5113365Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5113487Z     out = model(new_inputs)
2025-12-04T11:31:57.5113954Z   File "/tmp/tmpdz9eef4b/27/c27yqyxljiol2gqdvi4ib2hnzms6hh5nu6tdhe5dae575qpbziz5.py", line 227, in call
2025-12-04T11:31:57.5114322Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5114435Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5114675Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5114683Z 
2025-12-04T11:31:57.5114906Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5115404Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5115413Z 
2025-12-04T11:31:57.5115676Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5115906Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5116103Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5117479Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5117825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5117924Z graph_break []
2025-12-04T11:31:57.5118153Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5118312Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5118656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5120005Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5120149Z graph_break []
2025-12-04T11:31:57.5120306Z =================================== FAILURES ===================================
2025-12-04T11:31:57.5120568Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5120700Z Traceback (most recent call last):
2025-12-04T11:31:57.5121114Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5121246Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5121736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5121846Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5122210Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5122319Z     def fn(x):
2025-12-04T11:31:57.5122833Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5122960Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5123421Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5123540Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5124141Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5124281Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5124892Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5125014Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5125576Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5125728Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5126271Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5126385Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5126941Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5127049Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5127509Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5127635Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5128089Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5128211Z     out = model(new_inputs)
2025-12-04T11:31:57.5128681Z   File "/tmp/tmpdu2grb28/xe/cxet3htjci5kxwcdyfvvf4robtutuvgi2ijy7r2fmo3f6oiavm5f.py", line 227, in call
2025-12-04T11:31:57.5129037Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5129164Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5129402Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5129438Z 
2025-12-04T11:31:57.5129665Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5130157Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5130191Z 
2025-12-04T11:31:57.5130454Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5130679Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5130870Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5132226Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5132528Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5132626Z graph_break []
2025-12-04T11:31:57.5132848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5133009Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5133317Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5134664Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5134760Z graph_break []
2025-12-04T11:31:57.5134984Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5135140Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5135444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5136907Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5137009Z graph_break []
2025-12-04T11:31:57.5137789Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml -
2025-12-04T11:31:57.5137959Z =========================== short test summary info ============================
2025-12-04T11:31:57.5138663Z FAILED [1.3130s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5138669Z 
2025-12-04T11:31:57.5138879Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5139374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5139396Z 
2025-12-04T11:31:57.5139655Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5139858Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.5140070Z ================== 1 failed, 17 deselected, 2 rerun in 7.40s ===================
2025-12-04T11:31:57.5140170Z Got exit code 1
2025-12-04T11:31:57.5140273Z Retrying single test...
2025-12-04T11:31:57.5140879Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml
2025-12-04T11:31:57.5141065Z ============================= test session starts ==============================
2025-12-04T11:31:57.5141423Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.5141528Z cachedir: .pytest_cache
2025-12-04T11:31:57.5142067Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.5142199Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.5142304Z configfile: pytest.ini
2025-12-04T11:31:57.5142883Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.5143136Z collecting ... collected 32 items / 31 deselected / 1 selected
2025-12-04T11:31:57.5143714Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda
2025-12-04T11:31:57.5143839Z Running 1 items in this shard
2025-12-04T11:31:57.5143844Z 
2025-12-04T11:31:57.5144701Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:33.730386338 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T11:31:57.5145212Z [W1204 11:30:33.730410993 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5145229Z 
2025-12-04T11:31:57.5145358Z ('RERUN', {'yellow': True}) [20.0918s] [100%]
2025-12-04T11:31:57.5146250Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:50.640929219 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5146256Z 
2025-12-04T11:31:57.5146396Z ('RERUN', {'yellow': True}) [1.3430s] [100%]
2025-12-04T11:31:57.5147283Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:30:51.970355486 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5147288Z 
2025-12-04T11:31:57.5147401Z FAILED [1.3269s] [100%]
2025-12-04T11:31:57.5147406Z 
2025-12-04T11:31:57.5147543Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.5147801Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5147930Z Traceback (most recent call last):
2025-12-04T11:31:57.5148343Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5148472Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5148961Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5149070Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5149446Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5149540Z     def fn(x):
2025-12-04T11:31:57.5149956Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5150076Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5150542Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5150671Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5151285Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5151426Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5152034Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5152150Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5152734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5152882Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5153422Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5153581Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5154130Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5154271Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5154734Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5154860Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5155255Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5155374Z     out = model(new_inputs)
2025-12-04T11:31:57.5155851Z   File "/tmp/tmpsehuk76x/mm/cmmol6g33c64qnicaudgkpdgbxfisiphdlux2cdngrz2csklmdql.py", line 227, in call
2025-12-04T11:31:57.5156219Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5156334Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5156571Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5157322Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5157432Z C++ CapturedTraceback:
2025-12-04T11:31:57.5158723Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5159199Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5159523Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5160327Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5161581Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5163376Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5170403Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5173901Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5175273Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5175890Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5180151Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5180878Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5181886Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5185491Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5186152Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5186884Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5187725Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5192551Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5192830Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5193150Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5193430Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5193684Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5194093Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5194396Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5194684Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5194993Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5195425Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5195801Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5196195Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5196585Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5196990Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5197383Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5197691Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5197972Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5198264Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5198671Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5199033Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5199430Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5199803Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5200061Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5200435Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5200730Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5201259Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5201572Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5201967Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5202345Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5202822Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5203186Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5203595Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5203959Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5204209Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5204349Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5204709Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5205117Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5205238Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5205355Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5205732Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5206060Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5206437Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5206828Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5207187Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5207526Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5207778Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5208137Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5208476Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5208725Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5209145Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5209429Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5209678Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5210052Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5210336Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5210595Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5210959Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5211208Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5211583Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5211840Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5212202Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5212466Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5212833Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5213245Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5213605Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5214001Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5214372Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5214769Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5215140Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5215530Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5215893Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5216296Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5216657Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5216958Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5217208Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5217595Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5217951Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5218247Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5218530Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5218880Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5219286Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5219668Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5220101Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5220471Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5220775Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5221147Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5221557Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5221925Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5222324Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5222704Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5223055Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5223364Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5223666Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5223931Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5224199Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5224567Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5224970Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5225353Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5225757Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5226135Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5226395Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5226758Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5227173Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5227537Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5227950Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5228315Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5228571Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5228950Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5229394Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5229766Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5230179Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5230551Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5230939Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5231247Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5231572Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5231886Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5232295Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5232710Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5232968Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5233338Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5233760Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5234129Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5234543Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5234917Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5235269Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5235590Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5235885Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5236185Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5236605Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5236974Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5237388Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5237757Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5238158Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5238542Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5238804Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5239182Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5239587Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5239951Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5240366Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5240733Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5241092Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5241422Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5241713Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5242022Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5242519Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5242928Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5243351Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5243751Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5244170Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5244570Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5244970Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5245350Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5245750Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5246133Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5246419Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5246720Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5246995Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5247279Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5247636Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5247955Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5248239Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5248516Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5248776Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5248967Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5249081Z #184 _start from ??:0
2025-12-04T11:31:57.5249198Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5249204Z 
2025-12-04T11:31:57.5249210Z 
2025-12-04T11:31:57.5249436Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5249937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5249945Z 
2025-12-04T11:31:57.5250219Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5250436Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5250596Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5251965Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5252273Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5252384Z graph_break []
2025-12-04T11:31:57.5252670Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5252793Z Traceback (most recent call last):
2025-12-04T11:31:57.5253222Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5253351Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5253830Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5253985Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5254348Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5254454Z     def fn(x):
2025-12-04T11:31:57.5254870Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5255007Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5255479Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5255629Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5256213Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5256367Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5256962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5257092Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5257641Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5257775Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5258331Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5258446Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5258998Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5259118Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5259565Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5259704Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5260099Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5260201Z     out = model(new_inputs)
2025-12-04T11:31:57.5260665Z   File "/tmp/tmp_axzmm2t/lf/clfxqxcsmsfumbhzv7b3bld7fpumd3g7khb5qhbxg4xoqjclpb7f.py", line 227, in call
2025-12-04T11:31:57.5261023Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5261146Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5261385Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5262114Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5262236Z C++ CapturedTraceback:
2025-12-04T11:31:57.5263519Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5264002Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5264335Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5265167Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5266460Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5268137Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5275155Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5278590Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5280003Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5280605Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5284930Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5285724Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5286679Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5290269Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5290890Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5291628Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5292445Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5297404Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5297685Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5298030Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5298293Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5298594Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5298957Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5299260Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5299547Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5299841Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5300252Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5300614Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5301187Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5301565Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5301958Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5302326Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5302620Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5302904Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5303210Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5303603Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5303971Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5304363Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5304724Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5304986Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5305346Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5305650Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5305940Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5306234Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5306637Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5306997Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5307453Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5307829Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5308222Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5308590Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5308881Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5309003Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5309380Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5309814Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5309943Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5310059Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5310474Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5310736Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5311096Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5311486Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5311858Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5312141Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5312405Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5312763Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5313046Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5313311Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5313673Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5313963Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5314216Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5314576Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5314867Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5315119Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5315477Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5315736Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5316095Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5316350Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5316708Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5316956Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5317325Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5317715Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5318096Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5318514Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5318876Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5319280Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5319638Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5320060Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5320430Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5320819Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5321221Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5321506Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5321784Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5322153Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5322587Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5322900Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5323181Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5323472Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5323893Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5324261Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5324675Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5325041Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5325300Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5325680Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5326082Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5326448Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5326859Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5327222Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5327580Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5327883Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5328178Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5328450Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5328707Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5329087Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5329494Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5329862Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5330310Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5330680Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5330935Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5331311Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5331741Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5332119Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5332518Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5332914Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5333179Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5333578Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5333986Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5334350Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5334750Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5335124Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5335472Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5335789Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5336075Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5336374Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5336784Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5337146Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5337402Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5337781Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5338181Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5338560Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5338960Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5339328Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5339688Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5339989Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5340289Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5340589Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5340992Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5341370Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5341770Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5342177Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5342577Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5342944Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5343218Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5343615Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5344022Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5344400Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5344835Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5345222Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5345601Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5345907Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5346211Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5346514Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5346930Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5347299Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5347705Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5348087Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5348494Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5348879Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5349280Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5349650Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5350066Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5350432Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5350737Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5351033Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5351300Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5351591Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5351931Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5352250Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5352547Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5352809Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5353085Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5353279Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5353380Z #184 _start from ??:0
2025-12-04T11:31:57.5353509Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5353517Z 
2025-12-04T11:31:57.5353552Z 
2025-12-04T11:31:57.5353765Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5354263Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5354283Z 
2025-12-04T11:31:57.5354543Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5354787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5354956Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5356300Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5356655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5356776Z graph_break []
2025-12-04T11:31:57.5356984Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5357156Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5357448Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5358929Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5359022Z graph_break []
2025-12-04T11:31:57.5359159Z =================================== FAILURES ===================================
2025-12-04T11:31:57.5359432Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5359551Z Traceback (most recent call last):
2025-12-04T11:31:57.5359964Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5360101Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5360575Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5360696Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5361053Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5361143Z     def fn(x):
2025-12-04T11:31:57.5361570Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5361674Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5362132Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5362257Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5362953Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5363103Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5363699Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5363810Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5364369Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5364500Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5365051Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5365203Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5365745Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5365863Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5366307Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5366432Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5366862Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5366967Z     out = model(new_inputs)
2025-12-04T11:31:57.5367457Z   File "/tmp/tmpzrap4p58/ai/caigvbcolirtxl4f37pdfnsretrfde6fwtpjzyg4qdu2djzyyzek.py", line 227, in call
2025-12-04T11:31:57.5367842Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5367952Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5368235Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5368969Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5369075Z C++ CapturedTraceback:
2025-12-04T11:31:57.5370362Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5370828Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5371162Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5371951Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5373215Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5374879Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5381886Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5385340Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5386760Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5387360Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5391595Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5392336Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5393290Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5396938Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5397584Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5398357Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5399171Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5404334Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5404612Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5404946Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5405207Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5405476Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5405843Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5406138Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5406435Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5406729Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5407122Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5407501Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5407894Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5408335Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5408733Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5409091Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5409398Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5409737Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5410045Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5410433Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5410832Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5411240Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5411632Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5411893Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5412252Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5412547Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5412843Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5413133Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5413528Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5413893Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5414288Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5414663Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5415051Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5415409Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5415676Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5415795Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5416166Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5416562Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5416682Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5416807Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5417167Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5417421Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5417786Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5418181Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5418551Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5418834Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5419084Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5419454Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5419765Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5420026Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5420384Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5420664Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5420946Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5421306Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5421589Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5421877Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5422230Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5422517Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5422879Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5423123Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5423501Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5423747Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5424113Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5424509Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5424867Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5425278Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5425639Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5426026Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5426393Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5426783Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5427149Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5427543Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5427900Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5428194Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5428443Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5428812Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5429149Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5429441Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5429736Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5430030Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5430456Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5430827Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5431265Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5431650Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5431907Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5432274Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5432719Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5433088Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5433537Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5433906Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5434285Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5434600Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5434893Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5435176Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5435438Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5435809Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5436230Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5436606Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5437025Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5437395Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5437653Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5438040Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5438447Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5438814Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5439232Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5439603Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5439879Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5440248Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5440651Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5441031Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5441434Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5441819Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5442165Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5442551Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5442857Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5443203Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5443622Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5443989Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5444247Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5444660Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5445064Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5445463Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5445877Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5446277Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5446636Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5446935Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5447224Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5447535Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5447940Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5448327Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5448728Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5449095Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5449507Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5449873Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5450140Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5450509Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5450910Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5451290Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5451690Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5452060Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5452423Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5452726Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5453033Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5453335Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5453737Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5454115Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5454521Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5454946Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5455354Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5455720Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5456133Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5456529Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5456948Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5457322Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5457639Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5457958Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5458253Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5458534Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5458890Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5459209Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5459507Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5459772Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5460038Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5460246Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5460344Z #184 _start from ??:0
2025-12-04T11:31:57.5460463Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5460487Z 
2025-12-04T11:31:57.5460492Z 
2025-12-04T11:31:57.5460710Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5461212Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5461217Z 
2025-12-04T11:31:57.5461494Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5461718Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5461877Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5463247Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5463555Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5463665Z graph_break []
2025-12-04T11:31:57.5463882Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5464040Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5464349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5465814Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5465924Z graph_break []
2025-12-04T11:31:57.5466135Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5466325Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5466634Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5468128Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5468239Z graph_break []
2025-12-04T11:31:57.5469016Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml -
2025-12-04T11:31:57.5469236Z =========================== short test summary info ============================
2025-12-04T11:31:57.5469927Z FAILED [1.3269s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5470694Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5470814Z C++ CapturedTraceback:
2025-12-04T11:31:57.5472085Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5472573Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5472897Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5473687Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5474954Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5476630Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5483767Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5487228Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5488652Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5489254Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5493475Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5494217Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5495179Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5498914Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5499566Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5500352Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5501357Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5506164Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5506446Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5506762Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5507027Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5507299Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5507664Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5507977Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5508263Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5508556Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5508968Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5509333Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5509738Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5510166Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5510558Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5510934Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5511233Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5511558Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5511865Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5512300Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5512668Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5513064Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5513468Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5513732Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5514094Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5514406Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5514689Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5514981Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5515391Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5515752Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5516165Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5516522Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5516919Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5517294Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5517545Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5517666Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5518046Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5518441Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5518572Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5518689Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5519052Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5519321Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5519681Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5520076Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5520446Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5520734Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5520997Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5521391Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5521679Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5521941Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5522304Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5522698Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5522991Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5523357Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5523688Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5523936Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5524316Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5524596Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5524956Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5525221Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5525582Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5525829Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5526209Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5526611Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5526986Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5527386Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5527747Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5528152Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5528512Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5528921Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5529284Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5529681Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5530055Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5530347Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5530599Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5530974Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5531317Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5531632Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5531918Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5532214Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5532643Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5533045Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5533471Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5533846Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5534106Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5534519Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5534928Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5535309Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5535736Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5536106Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5536500Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5536806Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5537096Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5537378Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5537637Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5538022Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5538426Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5538794Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5539209Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5539577Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5539850Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5540215Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5540617Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5540997Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5541399Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5541780Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5542039Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5542408Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5542818Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5543182Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5543583Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5543962Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5544308Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5544624Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5544945Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5545248Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5545666Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5546033Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5546331Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5546699Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5547099Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5547528Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5547933Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5548345Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5548690Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5548991Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5549301Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5549601Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5550006Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5550390Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5550792Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5551176Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5551577Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5551945Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5552216Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5552589Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5552999Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5553367Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5553765Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5554149Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5554494Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5554805Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5555094Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5555393Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5555806Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5556177Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5556576Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5556985Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5557391Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5557769Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5558172Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5558569Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5558985Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5559382Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5559680Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5559981Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5560277Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5560565Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5560909Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5561228Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5561522Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5561790Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5562068Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5562262Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5562360Z #184 _start from ??:0
2025-12-04T11:31:57.5562582Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5562590Z 
2025-12-04T11:31:57.5562594Z 
2025-12-04T11:31:57.5562811Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5563333Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5563339Z 
2025-12-04T11:31:57.5563601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5563782Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.5563992Z ================== 1 failed, 31 deselected, 2 rerun in 22.80s ==================
2025-12-04T11:31:57.5564091Z Got exit code 1
2025-12-04T11:31:57.5564195Z Retrying single test...
2025-12-04T11:31:57.5564812Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml
2025-12-04T11:31:57.5564971Z ============================= test session starts ==============================
2025-12-04T11:31:57.5565334Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.5565443Z cachedir: .pytest_cache
2025-12-04T11:31:57.5565946Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.5566081Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.5566186Z configfile: pytest.ini
2025-12-04T11:31:57.5566772Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.5566984Z collecting ... collected 32 items / 31 deselected / 1 selected
2025-12-04T11:31:57.5567567Z stepcurrent: skipping 17 already run items. Running only test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda
2025-12-04T11:31:57.5567693Z Running 1 items in this shard
2025-12-04T11:31:57.5567741Z 
2025-12-04T11:31:57.5568595Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:08.963280782 unwind.cpp:219] Warning: Unsupported unwinding pattern: Address not in range (function unwinderFor)
2025-12-04T11:31:57.5569119Z [W1204 11:31:08.963304307 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5569125Z 
2025-12-04T11:31:57.5569284Z ('RERUN', {'yellow': True}) [20.3320s] [100%]
2025-12-04T11:31:57.5570177Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:25.144290228 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5570213Z 
2025-12-04T11:31:57.5570351Z ('RERUN', {'yellow': True}) [1.3565s] [100%]
2025-12-04T11:31:57.5571242Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda [W1204 11:31:26.435756544 Module.cpp:201] symbolizing C++ stack trace for exception; if this hangs, rerun with TORCH_DISABLE_ADDR2LINE=1...
2025-12-04T11:31:57.5571276Z 
2025-12-04T11:31:57.5571389Z FAILED [1.2890s] [100%]
2025-12-04T11:31:57.5571394Z 
2025-12-04T11:31:57.5571535Z ==================================== RERUNS ====================================
2025-12-04T11:31:57.5571796Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5571928Z Traceback (most recent call last):
2025-12-04T11:31:57.5572341Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5572479Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5572962Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5573071Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5573445Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5573541Z     def fn(x):
2025-12-04T11:31:57.5573960Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5574077Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5574538Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5574666Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5575257Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5575397Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5576011Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5576125Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5576700Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5576835Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5577379Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5577509Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5578063Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5578173Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5578640Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5578770Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5579180Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5579287Z     out = model(new_inputs)
2025-12-04T11:31:57.5579796Z   File "/tmp/tmpvh293qd0/z7/cz75itwfjcnm4yvpxo35zryxuqyb7drx2ljgdlwurbj2o2ooh7ar.py", line 227, in call
2025-12-04T11:31:57.5580166Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5580281Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5580521Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5581295Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5581406Z C++ CapturedTraceback:
2025-12-04T11:31:57.5582691Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5583227Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5583566Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5584357Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5585608Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5587300Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5594276Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5597753Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5599189Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5599791Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5604359Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5605094Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5606068Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5609750Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5610403Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5611178Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5612018Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5616918Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5617251Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5617586Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5617854Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5618111Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5618501Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5618804Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5619094Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5619401Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5619803Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5620184Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5620582Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5620944Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5621360Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5621724Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5622036Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5622328Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5622623Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5623072Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5623437Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5623844Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5624239Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5624496Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5624874Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5625199Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5625483Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5625794Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5626223Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5626607Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5627000Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5627360Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5627766Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5628128Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5628399Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5628519Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5628883Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5629294Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5629412Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5629525Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5629900Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5630151Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5630522Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5630918Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5631278Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5631578Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5631829Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5632201Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5632484Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5632734Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5633106Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5633386Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5633639Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5634013Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5634331Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5634597Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5634961Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5635209Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5635610Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5635861Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5636233Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5636511Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5636877Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5637317Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5637677Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5638072Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5638449Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5638843Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5639214Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5639611Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5639975Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5640380Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5640739Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5641035Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5641288Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5641650Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5642003Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5642302Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5642702Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5643001Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5643409Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5643790Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5644192Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5644564Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5644838Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5645209Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5645626Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5646051Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5646455Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5646838Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5647188Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5647535Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5647829Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5648096Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5648396Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5648768Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5649216Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5649584Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5649987Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5650366Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5650625Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5650994Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5651411Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5651777Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5652193Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5652560Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5652817Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5653200Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5653603Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5653980Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5654385Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5654750Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5655112Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5655415Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5655721Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5656023Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5656425Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5656803Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5657061Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5657429Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5657877Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5658246Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5658660Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5659025Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5659399Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5659716Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5660043Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5660354Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5660758Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5661156Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5661568Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5661934Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5662350Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5662713Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5662969Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5663350Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5663751Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5664120Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5664531Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5664898Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5665259Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5665559Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5665849Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5666161Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5666560Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5666944Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5667347Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5667714Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5668129Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5668495Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5668950Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5669317Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5669754Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5670139Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5670423Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5670724Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5671000Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5671316Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5671678Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5672030Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5672318Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5672603Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5672900Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5673107Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5673207Z #184 _start from ??:0
2025-12-04T11:31:57.5673325Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5673331Z 
2025-12-04T11:31:57.5673336Z 
2025-12-04T11:31:57.5673566Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5674070Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5674076Z 
2025-12-04T11:31:57.5674337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5674572Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5674733Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5676113Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5676420Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5676532Z graph_break []
2025-12-04T11:31:57.5676792Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5676913Z Traceback (most recent call last):
2025-12-04T11:31:57.5677340Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5677470Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5677952Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5678080Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5678442Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5678535Z     def fn(x):
2025-12-04T11:31:57.5678966Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5679074Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5679549Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5679664Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5680252Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5680405Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5681045Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5681181Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5681736Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5681869Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5682574Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5682697Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5683243Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5683399Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5683850Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5683990Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5684520Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5684628Z     out = model(new_inputs)
2025-12-04T11:31:57.5685117Z   File "/tmp/tmp0q8t7vtj/37/c37ypiaxo5cnfzthmtsb4kk4r2dlmjwvcr4olm7aszoktcoqoufn.py", line 227, in call
2025-12-04T11:31:57.5685476Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5685595Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5685851Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5686595Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5686721Z C++ CapturedTraceback:
2025-12-04T11:31:57.5687997Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5688489Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5688819Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5689610Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5690886Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5692562Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5699570Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5703229Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5704612Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5705215Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5709418Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5710239Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5711202Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5714842Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5715527Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5716262Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5717084Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5721920Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5722202Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5722590Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5722856Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5723127Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5723496Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5723809Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5724094Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5724433Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5724847Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5725209Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5725609Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5726017Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5726414Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5726821Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5727118Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5727407Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5727745Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5728142Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5728519Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5728916Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5729277Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5729546Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5729906Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5730205Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5730506Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5730801Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5731213Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5731577Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5731974Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5732346Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5732744Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5733118Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5733377Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5733498Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5733871Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5734262Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5734384Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5734510Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5734876Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5735143Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5735504Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5735927Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5736301Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5736585Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5736847Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5737236Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5737522Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5737784Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5738189Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5738472Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5738736Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5739141Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5739435Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5739684Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5740044Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5740306Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5740669Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5740933Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5741295Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5741549Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5741924Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5742324Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5742685Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5743096Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5743456Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5743867Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5744228Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5744621Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5744995Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5745387Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5745758Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5746041Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5746291Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5746663Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5747004Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5747345Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5747632Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5747925Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5748347Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5748745Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5749152Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5749532Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5749820Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5750199Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5750640Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5751010Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5751422Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5751789Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5752152Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5752452Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5752744Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5753023Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5753281Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5753664Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5754064Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5754432Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5754847Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5755213Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5755472Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5755849Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5756250Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5756636Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5757038Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5757405Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5757676Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5758043Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5758456Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5758829Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5759261Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5759644Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5759989Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5760305Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5760633Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5760934Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5761351Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5761752Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5762010Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5762534Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5762943Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5763324Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5763725Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5764092Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5764452Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5764755Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5765061Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5765364Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5765765Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5766147Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5766545Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5766928Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5767327Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5767697Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5767966Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5768334Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5768737Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5769117Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5769517Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5769899Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5770244Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5770565Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5770869Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5771209Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5771629Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5771998Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5772406Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5772818Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5773222Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5773636Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5774035Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5774407Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5774856Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5775221Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5775508Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5775826Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5776091Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5776386Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5776734Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5777053Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5777356Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5777624Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5777900Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5778092Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5778194Z #184 _start from ??:0
2025-12-04T11:31:57.5778335Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5778340Z 
2025-12-04T11:31:57.5778345Z 
2025-12-04T11:31:57.5778562Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5779061Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5779084Z 
2025-12-04T11:31:57.5779346Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5779567Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5779743Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5781102Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5781425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5781526Z graph_break []
2025-12-04T11:31:57.5781740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5781916Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5782212Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5783709Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5783820Z graph_break []
2025-12-04T11:31:57.5784004Z =================================== FAILURES ===================================
2025-12-04T11:31:57.5784273Z ___________________ TestUnbackedSymintsCUDA.test_sdpfa_cuda ____________________
2025-12-04T11:31:57.5784395Z Traceback (most recent call last):
2025-12-04T11:31:57.5784848Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 524, in test_sdpfa
2025-12-04T11:31:57.5784995Z     torch.compile(fn, fullgraph=True)(x)
2025-12-04T11:31:57.5785483Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 926, in compile_wrapper
2025-12-04T11:31:57.5785641Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5786008Z   File "/var/lib/jenkins/workspace/test/inductor/test_unbacked_symints.py", line 509, in fn
2025-12-04T11:31:57.5786100Z     def fn(x):
2025-12-04T11:31:57.5786532Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 1154, in _fn
2025-12-04T11:31:57.5786643Z     return fn(*args, **kwargs)
2025-12-04T11:31:57.5787104Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1148, in forward
2025-12-04T11:31:57.5787231Z     return compiled_fn(full_args)
2025-12-04T11:31:57.5787819Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 357, in runtime_wrapper
2025-12-04T11:31:57.5787975Z     all_outs = call_func_at_runtime_with_args(
2025-12-04T11:31:57.5788573Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/utils.py", line 134, in call_func_at_runtime_with_args
2025-12-04T11:31:57.5788694Z     out = normalize_as_list(f(args))
2025-12-04T11:31:57.5789260Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 1962, in __call__
2025-12-04T11:31:57.5789391Z     return self.compiled_fn(*args, **kwargs)
2025-12-04T11:31:57.5789937Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 531, in wrapper
2025-12-04T11:31:57.5790066Z     return compiled_fn(runtime_args)
2025-12-04T11:31:57.5790614Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 729, in inner_fn
2025-12-04T11:31:57.5790738Z     outs = compiled_fn(args)
2025-12-04T11:31:57.5791187Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/output_code.py", line 627, in __call__
2025-12-04T11:31:57.5791311Z     return self.current_callable(inputs)
2025-12-04T11:31:57.5791724Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/utils.py", line 3247, in run
2025-12-04T11:31:57.5791831Z     out = model(new_inputs)
2025-12-04T11:31:57.5792326Z   File "/tmp/tmp5ecai3s6/jl/cjlyyplawpcwhafdzduzmiv34giqznfuxr7m7doagkglbwv2n7uy.py", line 227, in call
2025-12-04T11:31:57.5792685Z   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_ops.py", line 836, in __call__
2025-12-04T11:31:57.5792801Z     return self._op(*args, **kwargs)
2025-12-04T11:31:57.5793055Z RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5793787Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5793898Z C++ CapturedTraceback:
2025-12-04T11:31:57.5795219Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5795693Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5796029Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5796845Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5798141Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5799855Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5807171Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5810667Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5812088Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5812689Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5816947Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5817715Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5818687Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5822279Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5822905Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5823646Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5824466Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5829341Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5829664Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5829995Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5830262Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5830529Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5830894Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5831197Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5831502Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5831800Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5832200Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5832576Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5832973Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5833347Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5833742Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5834102Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5834411Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5834698Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5835003Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5835396Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5835754Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5836165Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5836529Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5836782Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5837155Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5837497Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5837797Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5838091Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5838488Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5838891Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5839285Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5839661Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5840084Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5840445Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5840737Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5840858Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5841231Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5841625Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5841743Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5841874Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5842235Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5842546Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5842923Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5843320Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5843697Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5843986Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5844236Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5844610Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5844894Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5845141Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5845517Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5845798Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5846062Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5846422Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5846701Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5846962Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5847324Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5847585Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5847946Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5848196Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5848567Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5848862Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5849225Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5849630Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5849990Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5850427Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5850788Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5851212Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5851588Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5852016Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5852392Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5852788Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5853145Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5853443Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5853694Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5854067Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5854405Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5854705Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5855008Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5855302Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5855708Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5856090Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5856490Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5862410Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5862757Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5863143Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5863582Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5863957Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5864377Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5864749Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5865097Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5865417Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5865712Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5865981Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5866342Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5866717Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5867137Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5867506Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5867947Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5868331Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5868627Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5869004Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5869408Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5869820Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5870235Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5870601Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5870862Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5871240Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5871641Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5872027Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5872419Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5872785Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5873151Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5873453Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5873747Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5874059Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5874461Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5874844Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5875101Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5875472Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5875884Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5876249Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5876664Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5877031Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5877376Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5877693Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5877983Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5878333Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5878736Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5879102Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5879555Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5879927Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5880328Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5880737Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5880994Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5881427Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5881830Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5882193Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5882709Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5883076Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5883435Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5883737Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5884028Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5884341Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5884745Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5885128Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5885530Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5885899Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5886314Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5886683Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5887081Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5887465Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5887867Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5888250Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5888537Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5888841Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5889119Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5889398Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5889757Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5890077Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5890397Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5890675Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5890937Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5891131Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5891245Z #184 _start from ??:0
2025-12-04T11:31:57.5891393Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5891402Z 
2025-12-04T11:31:57.5891407Z 
2025-12-04T11:31:57.5891635Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5892239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5892244Z 
2025-12-04T11:31:57.5892506Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5892774Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5892941Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5894328Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_miss', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5894632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5894730Z graph_break []
2025-12-04T11:31:57.5894961Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5895118Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5895430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5896906Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5897002Z graph_break []
2025-12-04T11:31:57.5897228Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T11:31:57.5897382Z stats [('calls_captured', 16), ('unique_graphs', 1)]
2025-12-04T11:31:57.5897689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('autograd_cache_saved', 1), ('ok', 1)]
2025-12-04T11:31:57.5899152Z inductor [('triton_bundler_save_kernel', 48), ('extern_calls', 10), ('async_compile_cache_miss', 6), ('benchmarking.InductorBenchmarker.benchmark', 6), ('benchmarking.InductorBenchmarker.benchmark_gpu', 6), ('pattern_matcher_count', 3), ('pattern_matcher_nodes', 3), ('async_compile_cache_hit', 3), ('fxgraph_cache_miss', 1), ('triton_bundler_save_static_autotuner', 1)]
2025-12-04T11:31:57.5899263Z graph_break []
2025-12-04T11:31:57.5900038Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml -
2025-12-04T11:31:57.5900206Z =========================== short test summary info ============================
2025-12-04T11:31:57.5901101Z FAILED [1.2890s] inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda - RuntimeError: FlashAttention only supports Ampere GPUs or newer.
2025-12-04T11:31:57.5901836Z Exception raised from mha_fwd at /var/lib/jenkins/workspace/aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.cpp:395 (most recent call first):
2025-12-04T11:31:57.5901948Z C++ CapturedTraceback:
2025-12-04T11:31:57.5903318Z #4 std::_Function_handler<std::shared_ptr<c10::LazyValue<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const> (), c10::SetStackTraceFetcher(std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>)::{lambda()#1}>::_M_invoke(std::_Any_data const&) from Logging.cpp:0
2025-12-04T11:31:57.5903795Z #5 c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) from ??:0
2025-12-04T11:31:57.5904179Z #6 c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) from ??:0
2025-12-04T11:31:57.5904966Z #7 pytorch_flash::mha_fwd(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor>&, std::optional<at::Tensor>&, float, float, bool, int, int, float, bool, std::optional<at::Generator>) from ??:0
2025-12-04T11:31:57.5906282Z #8 at::native::_flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, long, long, double, bool, bool, std::optional<double>, std::optional<long>, std::optional<long>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5908007Z #9 at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5914972Z #10 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___flash_attention_forward>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&> >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5918433Z #11 std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor> c10::callUnboxedKernelFunction<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor>, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&>(void*, c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt&&, c10::SymInt&&, double&&, bool&&, bool&&, std::optional<double>&&, std::optional<c10::SymInt>&&, std::optional<c10::SymInt>&&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) [clone .isra.0] from Operators_0.cpp:0
2025-12-04T11:31:57.5919841Z #12 at::_ops::_flash_attention_forward::call(at::Tensor const&, at::Tensor const&, at::Tensor const&, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&, c10::SymInt, c10::SymInt, double, bool, bool, std::optional<double>, std::optional<c10::SymInt>, std::optional<c10::SymInt>, std::optional<at::Tensor> const&, std::optional<at::Tensor> const&) from ??:0
2025-12-04T11:31:57.5920949Z #13 at::native::_scaled_dot_product_flash_attention_cuda(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5925264Z #14 c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from RegisterCUDA_0.cpp:0
2025-12-04T11:31:57.5926046Z #15 at::_ops::_scaled_dot_product_flash_attention::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from ??:0
2025-12-04T11:31:57.5926997Z #16 torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>) from VariableType_1.cpp:0
2025-12-04T11:31:57.5930597Z #17 c10::impl::make_boxed_from_unboxed_functor<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor> (c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double>), &torch::autograd::VariableType::(anonymous namespace)::_scaled_dot_product_flash_attention>, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, c10::SymInt, c10::SymInt, at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, bool, bool, std::optional<double> > >, false>::call(c10::OperatorKernel*, c10::OperatorHandle const&, c10::DispatchKeySet, std::vector<c10::IValue, std::allocator<c10::IValue> >*) from VariableType_1.cpp:0
2025-12-04T11:31:57.5931220Z #18 c10::Dispatcher::callBoxed(c10::OperatorHandle const&, std::vector<c10::IValue, std::allocator<c10::IValue> >*) const [clone .isra.0] from register_c10_ops.cpp:0
2025-12-04T11:31:57.5931957Z #19 torch::jit::invokeOperatorFromPython(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, pybind11::args const&, pybind11::kwargs const&, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5932805Z #20 torch::jit::_get_operation_for_overload_or_packet(c10::ArrayRef<std::shared_ptr<torch::jit::Operator> >, c10::Symbol, pybind11::args const&, pybind11::kwargs const&, bool, std::optional<c10::DispatchKey>) from :0
2025-12-04T11:31:57.5937680Z #21 pybind11::cpp_function::initialize<torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}, pybind11::object, pybind11::args const&, pybind11::kwargs const&>(torch::jit::initJITBindings(_object*)::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)#218}::operator()(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const::{lambda(pybind11::args const&, pybind11::kwargs const&)#1}&&, pybind11::object (*)(pybind11::args const&, pybind11::kwargs const&))::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) from init.cpp:0
2025-12-04T11:31:57.5938014Z #22 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) from :0
2025-12-04T11:31:57.5938343Z #23 cfunction_call from /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
2025-12-04T11:31:57.5938609Z #24 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5938876Z #25 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5917
2025-12-04T11:31:57.5939245Z #26 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5939547Z #27 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5939853Z #28 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5940150Z #29 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5940563Z #30 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5940927Z #31 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5941326Z #32 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5941701Z #33 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5942095Z #34 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5942459Z #35 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5942775Z #36 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5943060Z #37 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5943366Z #38 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5943759Z #39 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5944120Z #40 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5944532Z #41 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5944892Z #42 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5945158Z #43 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5945520Z #44 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5945859Z #45 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5946155Z #46 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5946450Z #47 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5946843Z #48 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5947242Z #49 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5947639Z #50 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5948040Z #51 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5948433Z #52 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5948793Z #53 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5949086Z #54 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5949207Z #55 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5949579Z #56 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5949974Z #57 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5950095Z #58 dynamo_eval_custom_code from ??:0
2025-12-04T11:31:57.5950223Z #59 dynamo__custom_eval_frame from :0
2025-12-04T11:31:57.5950585Z #60 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5950840Z #61 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5951215Z #62 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5951613Z #63 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5951991Z #64 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5952277Z #65 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5952527Z #66 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5952901Z #67 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5953186Z #68 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5953451Z #69 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5953812Z #70 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5954098Z #71 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5954362Z #72 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5954722Z #73 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5955002Z #74 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5955264Z #75 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5955626Z #76 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5955888Z #77 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5956249Z #78 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5956497Z #79 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5956904Z #80 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5957160Z #81 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5957531Z #82 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5957930Z #83 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5958317Z #84 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5958728Z #85 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5959090Z #86 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5959530Z #87 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5959896Z #88 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5960316Z #89 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5960689Z #90 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5961085Z #91 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5961445Z #92 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5961744Z #93 PyVectorcall_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:267
2025-12-04T11:31:57.5961996Z #94 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5962373Z #95 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5962791Z #96 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5963097Z #97 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5963396Z #98 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5963689Z #99 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5964113Z #100 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5964485Z #101 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5964888Z #102 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5965275Z #103 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5965536Z #104 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5965918Z #105 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5966324Z #106 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5966692Z #107 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5967106Z #108 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5967475Z #109 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5967826Z #110 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5968143Z #111 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5968434Z #112 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5968752Z #113 _PyObject_Call from /usr/local/src/conda/python-3.10.14/Objects/call.c:305
2025-12-04T11:31:57.5969014Z #114 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5969383Z #115 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5969799Z #116 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5970193Z #117 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5970608Z #118 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5970976Z #119 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5971263Z #120 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5971648Z #121 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5972082Z #122 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5972452Z #123 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5972871Z #124 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5973245Z #125 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5973515Z #126 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5973884Z #127 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5974288Z #128 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5974667Z #129 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5975071Z #130 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5975451Z #131 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5975798Z #132 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5976101Z #133 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5976405Z #134 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5976705Z #135 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5977125Z #136 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5977490Z #137 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5977751Z #138 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5978131Z #139 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5978533Z #140 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5978901Z #141 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5979315Z #142 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5979684Z #143 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5980045Z #144 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5980345Z #145 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5980665Z #146 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5980978Z #147 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5981381Z #148 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5981760Z #149 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5982209Z #150 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5982578Z #151 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5982990Z #152 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5983388Z #153 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5983661Z #154 do_call_core from /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
2025-12-04T11:31:57.5984056Z #155 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5984459Z #156 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5984839Z #157 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5985243Z #158 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5985613Z #159 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5985972Z #160 _PyObject_FastCallDictTstate from /usr/local/src/conda/python-3.10.14/Objects/call.c:153
2025-12-04T11:31:57.5986278Z #161 _PyObject_Call_Prepend from /usr/local/src/conda/python-3.10.14/Objects/call.c:431
2025-12-04T11:31:57.5986582Z #162 slot_tp_call from /usr/local/src/conda/python-3.10.14/Objects/typeobject.c:7494
2025-12-04T11:31:57.5986884Z #163 _PyObject_MakeTpCall from /usr/local/src/conda/python-3.10.14/Objects/call.c:215
2025-12-04T11:31:57.5987285Z #164 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
2025-12-04T11:31:57.5987666Z #165 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5988068Z #166 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5988447Z #167 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5988847Z #168 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5989214Z #169 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5989626Z #170 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5989998Z #171 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5990410Z #172 _PyObject_VectorcallTstate from /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
2025-12-04T11:31:57.5990779Z #173 _PyEval_EvalFrame from /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
2025-12-04T11:31:57.5991062Z #174 PyEval_EvalCode from /usr/local/src/conda/python-3.10.14/Python/ceval.c:1134
2025-12-04T11:31:57.5991371Z #175 run_eval_code_obj from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1291
2025-12-04T11:31:57.5991632Z #176 run_mod from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1312
2025-12-04T11:31:57.5991912Z #177 pyrun_file from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:1208
2025-12-04T11:31:57.5992270Z #178 _PyRun_SimpleFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:456
2025-12-04T11:31:57.5992618Z #179 _PyRun_AnyFileObject from /usr/local/src/conda/python-3.10.14/Python/pythonrun.c:90
2025-12-04T11:31:57.5992917Z #180 pymain_run_file_obj from /usr/local/src/conda/python-3.10.14/Modules/main.c:357
2025-12-04T11:31:57.5993181Z #181 Py_BytesMain from /usr/local/src/conda/python-3.10.14/Modules/main.c:1090
2025-12-04T11:31:57.5993438Z #182 __libc_start_call_main from ./csu/../sysdeps/nptl/libc_start_call_main.h:58
2025-12-04T11:31:57.5993640Z #183 __libc_start_main_impl from ./csu/../csu/libc-start.c:392
2025-12-04T11:31:57.5993767Z #184 _start from ??:0
2025-12-04T11:31:57.5993886Z #185 <unwind unsupported> from ??:0
2025-12-04T11:31:57.5993905Z 
2025-12-04T11:31:57.5993909Z 
2025-12-04T11:31:57.5994123Z To execute this test, run the following from the base repo dir:
2025-12-04T11:31:57.5994657Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_unbacked_symints.py TestUnbackedSymintsCUDA.test_sdpfa_cuda
2025-12-04T11:31:57.5994662Z 
2025-12-04T11:31:57.5994937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T11:31:57.5995142Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T11:31:57.5995349Z ================== 1 failed, 31 deselected, 2 rerun in 23.01s ==================
2025-12-04T11:31:57.5995444Z Got exit code 1
2025-12-04T11:31:57.5995864Z FAILED CONSISTENTLY: test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda
2025-12-04T11:31:57.5996274Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T11:31:57.5996867Z Test results will be stored in test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml
2025-12-04T11:31:57.5997029Z ============================= test session starts ==============================
2025-12-04T11:31:57.5997389Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T11:31:57.5997494Z cachedir: .pytest_cache
2025-12-04T11:31:57.5998016Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T11:31:57.5998135Z rootdir: /var/lib/jenkins/workspace
2025-12-04T11:31:57.5998238Z configfile: pytest.ini
2025-12-04T11:31:57.5998826Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T11:31:57.5999043Z collecting ... collected 32 items / 18 deselected / 14 selected
2025-12-04T11:31:57.5999183Z stepcurrent: skipping 18 already run items.
2025-12-04T11:31:57.5999307Z Running 14 items in this shard
2025-12-04T11:31:57.5999312Z 
2025-12-04T11:31:57.5999728Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_softmax_cuda PASSED [4.2993s] [  7%]
2025-12-04T11:31:57.6000188Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_split_with_sizes_cuda PASSED [0.5554s] [ 14%]
2025-12-04T11:31:57.6000676Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_to_int_with_unbacked_size_cuda PASSED [0.4729s] [ 21%]
2025-12-04T11:31:57.6001386Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_grid_cuda PASSED [1.1378s] [ 28%]
2025-12-04T11:31:57.6001978Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_triton_kernel_with_unbacked_symint_fallback_cuda PASSED [0.7579s] [ 35%]
2025-12-04T11:31:57.6003026Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_linear_layer_norm_input_cuda W1204 11:31:48.891000 105516 site-packages/torch/_inductor/utils.py:1703] [0/0] Not enough SMs to use max_autotune_gemm mode
2025-12-04T11:31:57.6003144Z PASSED [4.6323s] [ 42%]
2025-12-04T11:31:57.6003630Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_masked_scatter_cuda PASSED [0.6073s] [ 50%]
2025-12-04T11:31:57.6004134Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_range_tree_divisor_cuda PASSED [0.3520s] [ 57%]
2025-12-04T11:31:57.6004682Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_repeat_cuda PASSED [0.4206s] [ 64%]
2025-12-04T11:31:57.6005218Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic2_cuda PASSED [0.6099s] [ 71%]
2025-12-04T11:31:57.6005789Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_False_cuda PASSED [0.2889s] [ 78%]
2025-12-04T11:31:57.6006381Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_unbacked_slice_on_subclass_dynamic_True_cuda PASSED [0.4219s] [ 85%]
2025-12-04T11:31:57.6006923Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_vertical_pointwise_reduction_fusion_cuda PASSED [0.7337s] [ 92%]
2025-12-04T11:31:57.6007417Z inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_view_of_slice_cuda PASSED [0.4058s] [100%]
2025-12-04T11:31:57.6007423Z 
2025-12-04T11:31:57.6008208Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml -
2025-12-04T11:31:57.6008442Z ====================== 14 passed, 18 deselected in 15.74s ======================
2025-12-04T11:31:57.6009460Z The following tests failed consistently: ['test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdfpa_unbacked_strides_cuda', 'test/inductor/test_unbacked_symints.py::TestUnbackedSymintsCUDA::test_sdpfa_cuda']
2025-12-04T11:31:57.6009469Z 
2025-12-04T11:31:57.6010058Z FINISHED PRINTING LOG FILE of inductor/test_unbacked_symints 1/1 (test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log)
2025-12-04T11:31:57.6010064Z 
2025-12-04T11:31:57.6010430Z Finished inductor/test_unbacked_symints 1/1 ... [2025-12-04 11:31:57.322331][8274.932232816], took 3.65min
2025-12-04T11:31:57.6011254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml
2025-12-04T11:31:57.6012143Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml
2025-12-04T11:31:57.6012958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml
2025-12-04T11:31:57.6013785Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml
2025-12-04T11:31:57.6014597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml
2025-12-04T11:31:57.6015434Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml
2025-12-04T11:31:57.6016252Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml
2025-12-04T11:31:57.9557228Z Uploading logs for 57119749427 to S3
2025-12-04T11:31:58.0481954Z Uploading artifacts took 0.44 seconds
2025-12-04T11:31:58.0482394Z inductor/test_unbacked_symints 1/1 failed!
2025-12-04T11:31:58.0486278Z Running inductor/test_scatter_optimization 1/1 ... [2025-12-04 11:31:58.048443][8275.658350637]
2025-12-04T11:31:58.0486894Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:31:58.0491043Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_scatter_optimization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:31:58.048877]
2025-12-04T11:32:19.2447604Z 
2025-12-04T11:32:19.2448949Z inductor/test_scatter_optimization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_scatter_optimization_1.1_7430a249406bb12a_.log
2025-12-04T11:32:19.2453360Z Running 8 items in this shard: test/inductor/test_scatter_optimization.py::TestScatterOpt::test_3d_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_dense, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_can_not_optimize_due_to_non_const, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_cross_entropy_loss, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_neg_scatter_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_non_last_dim, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_nonzero_const_tensor, test/inductor/test_scatter_optimization.py::TestScatterOpt::test_shorter_index_tensor
2025-12-04T11:32:19.2456959Z 
2025-12-04T11:32:19.2457363Z Finished inductor/test_scatter_optimization 1/1 ... [2025-12-04 11:32:19.244555][8296.854464521], took 0.35min
2025-12-04T11:32:19.2559514Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.xml
2025-12-04T11:32:19.3293368Z Running inductor/test_mix_order_reduction 1/2 ... [2025-12-04 11:32:19.329016][8296.938922828]
2025-12-04T11:32:19.3293967Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T11:32:19.3297047Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mix_order_reduction.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 11:32:19.329468]
2025-12-04T12:12:57.4238981Z 
2025-12-04T12:12:57.4240251Z PRINTING LOG FILE of inductor/test_mix_order_reduction 1/2 (test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log)
2025-12-04T12:12:57.4242300Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml
2025-12-04T12:12:57.4243469Z ============================= test session starts ==============================
2025-12-04T12:12:57.4244368Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.4245238Z cachedir: .pytest_cache
2025-12-04T12:12:57.4246251Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.4247363Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.4247894Z configfile: pytest.ini
2025-12-04T12:12:57.4249299Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.4250174Z collecting ... collected 380 items
2025-12-04T12:12:57.4250575Z stepcurrent: Cannot find last run test, not skipping
2025-12-04T12:12:57.4451920Z Running 175 items in this shard: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_multi_workspace_allocation, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_non_contiguous_input, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims0, test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_3layer_split_reduction, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_XBLOCK_coordest_tuning, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape2, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape1, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_multi_workspace_allocation, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_non_contiguous_input, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True, test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims1
2025-12-04T12:12:57.4585009Z 
2025-12-04T12:12:57.4585660Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape0 PASSED [5.5904s] [  0%]
2025-12-04T12:12:57.4586999Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_False_shape1 PASSED [1.0034s] [  1%]
2025-12-04T12:12:57.4588356Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0 PASSED [1.0544s] [  1%]
2025-12-04T12:12:57.4589735Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape1 PASSED [1.4576s] [  2%]
2025-12-04T12:12:57.4591126Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0 PASSED [1.4458s] [  2%]
2025-12-04T12:12:57.4592505Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape1 PASSED [1.4676s] [  3%]
2025-12-04T12:12:57.4593828Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2 PASSED [2.5870s] [  4%]
2025-12-04T12:12:57.4595344Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0 PASSED [0.8181s] [  4%]
2025-12-04T12:12:57.4596819Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0 PASSED [0.8190s] [  5%]
2025-12-04T12:12:57.4598289Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape1 PASSED [1.0635s] [  5%]
2025-12-04T12:12:57.4599833Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_False_shape0 PASSED [0.7973s] [  6%]
2025-12-04T12:12:57.4601456Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape2 PASSED [0.5962s] [  6%]
2025-12-04T12:12:57.4603031Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape0 PASSED [1.4498s] [  7%]
2025-12-04T12:12:57.4605556Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [  8%]
2025-12-04T12:12:57.4607139Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape0 PASSED [0.4343s] [  8%]
2025-12-04T12:12:57.4608707Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape2 SKIPPED [0.0030s] (Invalid combination) [  9%]
2025-12-04T12:12:57.4610262Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape1 PASSED [0.4336s] [  9%]
2025-12-04T12:12:57.4611832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [ 10%]
2025-12-04T12:12:57.4613411Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0 PASSED [0.4593s] [ 10%]
2025-12-04T12:12:57.4614871Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape1 PASSED [0.4727s] [ 11%]
2025-12-04T12:12:57.4616320Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape0 PASSED [0.4698s] [ 12%]
2025-12-04T12:12:57.4617753Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2 PASSED [0.4813s] [ 12%]
2025-12-04T12:12:57.4619194Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0 PASSED [0.4735s] [ 13%]
2025-12-04T12:12:57.4620463Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_multi_workspace_allocation PASSED [0.5243s] [ 13%]
2025-12-04T12:12:57.4621516Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_non_contiguous_input PASSED [0.4761s] [ 14%]
2025-12-04T12:12:57.4622986Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.2257s] [ 14%]
2025-12-04T12:12:57.4624915Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [ 14%]
2025-12-04T12:12:57.4626757Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1551s] [ 14%]
2025-12-04T12:12:57.4627721Z 
2025-12-04T12:12:57.4627917Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.4628760Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4629557Z Traceback (most recent call last):
2025-12-04T12:12:57.4630258Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4631054Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4631636Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4631972Z 
2025-12-04T12:12:57.4632183Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4633495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4634556Z 
2025-12-04T12:12:57.4634834Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4635498Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4635958Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4636296Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4636725Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4637398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4637967Z graph_break []
2025-12-04T12:12:57.4638333Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4639407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4640345Z   warnings.warn(
2025-12-04T12:12:57.4641058Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4641865Z Traceback (most recent call last):
2025-12-04T12:12:57.4642801Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4643618Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4644171Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4644512Z 
2025-12-04T12:12:57.4644746Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4646045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4647148Z 
2025-12-04T12:12:57.4647419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4648052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4648531Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4648856Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4649294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4649997Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4650568Z graph_break []
2025-12-04T12:12:57.4650942Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4652040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4653024Z   warnings.warn(
2025-12-04T12:12:57.4653392Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4653866Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4654206Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4654686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4655503Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4656070Z graph_break []
2025-12-04T12:12:57.4656438Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4657531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4658479Z   warnings.warn(
2025-12-04T12:12:57.4658784Z =================================== FAILURES ===================================
2025-12-04T12:12:57.4659644Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4660453Z Traceback (most recent call last):
2025-12-04T12:12:57.4661147Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4661969Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4662488Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4662830Z 
2025-12-04T12:12:57.4663041Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4664313Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4665364Z 
2025-12-04T12:12:57.4665635Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4666237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4666700Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4667028Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4667445Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4668135Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4668701Z graph_break []
2025-12-04T12:12:57.4669067Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4670124Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4671072Z   warnings.warn(
2025-12-04T12:12:57.4671442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4671891Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4672219Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4672644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4673322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4673881Z graph_break []
2025-12-04T12:12:57.4674242Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4675319Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4676265Z   warnings.warn(
2025-12-04T12:12:57.4676628Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4677092Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4677426Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4677842Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4678527Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4679101Z graph_break []
2025-12-04T12:12:57.4679471Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4680570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4681524Z   warnings.warn(
2025-12-04T12:12:57.4682559Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml -
2025-12-04T12:12:57.4683696Z =========================== short test summary info ============================
2025-12-04T12:12:57.4685076Z FAILED [0.1551s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4686335Z 
2025-12-04T12:12:57.4686550Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4687829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4688919Z 
2025-12-04T12:12:57.4689196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4689776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.4690289Z ============== 1 failed, 22 passed, 3 skipped, 2 rerun in 25.02s ===============
2025-12-04T12:12:57.4690730Z Got exit code 1
2025-12-04T12:12:57.4690996Z Retrying single test...
2025-12-04T12:12:57.4691796Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml
2025-12-04T12:12:57.4695653Z ============================= test session starts ==============================
2025-12-04T12:12:57.4696307Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.4696904Z cachedir: .pytest_cache
2025-12-04T12:12:57.4697583Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.4698352Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.4698698Z configfile: pytest.ini
2025-12-04T12:12:57.4699450Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.4700389Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.4701999Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4703272Z Running 1 items in this shard
2025-12-04T12:12:57.4703484Z 
2025-12-04T12:12:57.4704379Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [100%]
2025-12-04T12:12:57.4706304Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1644s] [100%]
2025-12-04T12:12:57.4708146Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1590s] [100%]
2025-12-04T12:12:57.4709101Z 
2025-12-04T12:12:57.4709240Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.4710071Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4710867Z Traceback (most recent call last):
2025-12-04T12:12:57.4711688Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4712484Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4713017Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4713349Z 
2025-12-04T12:12:57.4713557Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4714879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4715951Z 
2025-12-04T12:12:57.4716256Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4716879Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4717339Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4717725Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4718272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4718948Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4719400Z graph_break []
2025-12-04T12:12:57.4719768Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4720842Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4721778Z   warnings.warn(
2025-12-04T12:12:57.4722553Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4723371Z Traceback (most recent call last):
2025-12-04T12:12:57.4724054Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4724854Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4725386Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4725717Z 
2025-12-04T12:12:57.4725942Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4727204Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4728265Z 
2025-12-04T12:12:57.4728524Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4729137Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4729601Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4729915Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4730458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4731139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4731571Z graph_break []
2025-12-04T12:12:57.4731934Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4733003Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4733955Z   warnings.warn(
2025-12-04T12:12:57.4734311Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4734773Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4735101Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4735518Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4736197Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4736762Z graph_break []
2025-12-04T12:12:57.4737180Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4738234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4739179Z   warnings.warn(
2025-12-04T12:12:57.4739490Z =================================== FAILURES ===================================
2025-12-04T12:12:57.4740351Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4741160Z Traceback (most recent call last):
2025-12-04T12:12:57.4741854Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4742679Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4743203Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4743583Z 
2025-12-04T12:12:57.4743795Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4745060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4746115Z 
2025-12-04T12:12:57.4746385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4746990Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4747453Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4747789Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4748325Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4749010Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4749465Z graph_break []
2025-12-04T12:12:57.4749833Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4750891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4751846Z   warnings.warn(
2025-12-04T12:12:57.4752224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4752676Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4753012Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4753441Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4754126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4754686Z graph_break []
2025-12-04T12:12:57.4755059Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4756129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4757065Z   warnings.warn(
2025-12-04T12:12:57.4757439Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4757901Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4758229Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4758644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4759324Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4759888Z graph_break []
2025-12-04T12:12:57.4760240Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4761306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4762316Z   warnings.warn(
2025-12-04T12:12:57.4763322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml -
2025-12-04T12:12:57.4764411Z =========================== short test summary info ============================
2025-12-04T12:12:57.4765814Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4767016Z 
2025-12-04T12:12:57.4767227Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4768505Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4769595Z 
2025-12-04T12:12:57.4769870Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4770466Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.4770978Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.4771418Z Got exit code 1
2025-12-04T12:12:57.4771670Z Retrying single test...
2025-12-04T12:12:57.4772489Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml
2025-12-04T12:12:57.4773417Z ============================= test session starts ==============================
2025-12-04T12:12:57.4774065Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.4774644Z cachedir: .pytest_cache
2025-12-04T12:12:57.4775344Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.4776116Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.4776448Z configfile: pytest.ini
2025-12-04T12:12:57.4777207Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.4778136Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.4779495Z stepcurrent: skipping 25 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4780728Z Running 1 items in this shard
2025-12-04T12:12:57.4780945Z 
2025-12-04T12:12:57.4781846Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5663s] [100%]
2025-12-04T12:12:57.4783775Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%]
2025-12-04T12:12:57.4785615Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1590s] [100%]
2025-12-04T12:12:57.4786556Z 
2025-12-04T12:12:57.4786709Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.4787528Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4788338Z Traceback (most recent call last):
2025-12-04T12:12:57.4789037Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4789828Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4790380Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4790725Z 
2025-12-04T12:12:57.4790933Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4792206Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4793254Z 
2025-12-04T12:12:57.4793561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4794161Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4794624Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4794984Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4795514Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4796199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4796702Z graph_break []
2025-12-04T12:12:57.4797067Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4798130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4799079Z   warnings.warn(
2025-12-04T12:12:57.4799795Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4800589Z Traceback (most recent call last):
2025-12-04T12:12:57.4801494Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4802349Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4802885Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4803215Z 
2025-12-04T12:12:57.4803426Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4804700Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4805767Z 
2025-12-04T12:12:57.4806025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4806639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4807091Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4807422Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4807966Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4808642Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4809091Z graph_break []
2025-12-04T12:12:57.4809459Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4810543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4811482Z   warnings.warn(
2025-12-04T12:12:57.4811860Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4812329Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4812647Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4813077Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4813751Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4814318Z graph_break []
2025-12-04T12:12:57.4814667Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4815832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4816788Z   warnings.warn(
2025-12-04T12:12:57.4817082Z =================================== FAILURES ===================================
2025-12-04T12:12:57.4817922Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.4818742Z Traceback (most recent call last):
2025-12-04T12:12:57.4819488Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4820273Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4820813Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4821191Z 
2025-12-04T12:12:57.4821421Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4822698Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4823796Z 
2025-12-04T12:12:57.4824057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4824674Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4825141Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4825463Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4826016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4826707Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4827159Z graph_break []
2025-12-04T12:12:57.4827515Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4828592Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4829541Z   warnings.warn(
2025-12-04T12:12:57.4829909Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4830382Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4830719Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4831145Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4831810Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4832373Z graph_break []
2025-12-04T12:12:57.4832734Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4833779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4834727Z   warnings.warn(
2025-12-04T12:12:57.4835101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4835571Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4835888Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4836320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4837001Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4837557Z graph_break []
2025-12-04T12:12:57.4837932Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4838993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4839945Z   warnings.warn(
2025-12-04T12:12:57.4840895Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml -
2025-12-04T12:12:57.4842052Z =========================== short test summary info ============================
2025-12-04T12:12:57.4843498Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4844679Z 
2025-12-04T12:12:57.4844906Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4846203Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4847269Z 
2025-12-04T12:12:57.4847566Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4848148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.4848662Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.4849120Z Got exit code 1
2025-12-04T12:12:57.4850126Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.4851509Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.4852677Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml
2025-12-04T12:12:57.4853583Z ============================= test session starts ==============================
2025-12-04T12:12:57.4854230Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.4854815Z cachedir: .pytest_cache
2025-12-04T12:12:57.4855516Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.4856274Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.4856619Z configfile: pytest.ini
2025-12-04T12:12:57.4857373Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.4858289Z collecting ... collected 380 items / 26 deselected / 354 selected
2025-12-04T12:12:57.4858779Z stepcurrent: skipping 26 already run items.
2025-12-04T12:12:57.4859158Z Running 149 items in this shard
2025-12-04T12:12:57.4859363Z 
2025-12-04T12:12:57.4860273Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5416s] [  0%]
2025-12-04T12:12:57.4862184Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [  0%]
2025-12-04T12:12:57.4864021Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1569s] [  0%]
2025-12-04T12:12:57.4864975Z 
2025-12-04T12:12:57.4865113Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.4865946Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4866743Z Traceback (most recent call last):
2025-12-04T12:12:57.4867436Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4868227Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4868755Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4869085Z 
2025-12-04T12:12:57.4869327Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4870604Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4871674Z 
2025-12-04T12:12:57.4871934Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4872585Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4873039Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4873369Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4873911Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4874636Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4875072Z graph_break []
2025-12-04T12:12:57.4875441Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4876546Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4877484Z   warnings.warn(
2025-12-04T12:12:57.4878207Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4879018Z Traceback (most recent call last):
2025-12-04T12:12:57.4879719Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4880498Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4881032Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4881360Z 
2025-12-04T12:12:57.4881580Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4882899Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4883968Z 
2025-12-04T12:12:57.4884228Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4884846Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4885313Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4885633Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4886180Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4886873Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4887326Z graph_break []
2025-12-04T12:12:57.4887677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4888755Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4889708Z   warnings.warn(
2025-12-04T12:12:57.4890069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4890537Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4890867Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4891294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4891968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4892540Z graph_break []
2025-12-04T12:12:57.4892907Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4893961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4894913Z   warnings.warn(
2025-12-04T12:12:57.4895273Z =================================== FAILURES ===================================
2025-12-04T12:12:57.4896121Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4896916Z Traceback (most recent call last):
2025-12-04T12:12:57.4897611Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4898526Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4899051Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4899395Z 
2025-12-04T12:12:57.4899605Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4901111Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4902236Z 
2025-12-04T12:12:57.4902520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4903121Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4903592Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4903920Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4904460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4905134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4905581Z graph_break []
2025-12-04T12:12:57.4905945Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4907000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4907951Z   warnings.warn(
2025-12-04T12:12:57.4908328Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4908791Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4909105Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4909537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4910220Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4910774Z graph_break []
2025-12-04T12:12:57.4911141Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4912205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4913158Z   warnings.warn(
2025-12-04T12:12:57.4913520Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4913978Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4914307Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4914726Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4915404Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4915966Z graph_break []
2025-12-04T12:12:57.4916312Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4917378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4918325Z   warnings.warn(
2025-12-04T12:12:57.4919285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml -
2025-12-04T12:12:57.4920374Z =========================== short test summary info ============================
2025-12-04T12:12:57.4921798Z FAILED [0.1569s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4923076Z 
2025-12-04T12:12:57.4923291Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4924621Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4925678Z 
2025-12-04T12:12:57.4925953Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4926515Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.4927071Z ================== 1 failed, 26 deselected, 2 rerun in 4.91s ===================
2025-12-04T12:12:57.4927503Z Got exit code 1
2025-12-04T12:12:57.4927752Z Retrying single test...
2025-12-04T12:12:57.4928596Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml
2025-12-04T12:12:57.4929514Z ============================= test session starts ==============================
2025-12-04T12:12:57.4930157Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.4930729Z cachedir: .pytest_cache
2025-12-04T12:12:57.4931419Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.4932182Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.4932512Z configfile: pytest.ini
2025-12-04T12:12:57.4933271Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.4934206Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.4935575Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4936812Z Running 1 items in this shard
2025-12-04T12:12:57.4937028Z 
2025-12-04T12:12:57.4937921Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5363s] [100%]
2025-12-04T12:12:57.4939832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [100%]
2025-12-04T12:12:57.4941664Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%]
2025-12-04T12:12:57.4942612Z 
2025-12-04T12:12:57.4942762Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.4943583Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4944393Z Traceback (most recent call last):
2025-12-04T12:12:57.4945091Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4945876Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4946399Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4946745Z 
2025-12-04T12:12:57.4946953Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4948271Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4949331Z 
2025-12-04T12:12:57.4949609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4950213Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4950677Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4951007Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4951577Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4952262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4952710Z graph_break []
2025-12-04T12:12:57.4953072Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4954167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4955153Z   warnings.warn(
2025-12-04T12:12:57.4955870Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4956674Z Traceback (most recent call last):
2025-12-04T12:12:57.4957372Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4958161Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4958701Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4959036Z 
2025-12-04T12:12:57.4959247Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4960518Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4961585Z 
2025-12-04T12:12:57.4961850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4962536Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4962993Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4963331Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4963881Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4964574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4965016Z graph_break []
2025-12-04T12:12:57.4965387Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4966460Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4967403Z   warnings.warn(
2025-12-04T12:12:57.4967783Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4968246Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4968583Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4968992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4969677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4970250Z graph_break []
2025-12-04T12:12:57.4970606Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4971681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4972644Z   warnings.warn(
2025-12-04T12:12:57.4972942Z =================================== FAILURES ===================================
2025-12-04T12:12:57.4973763Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.4974617Z Traceback (most recent call last):
2025-12-04T12:12:57.4975314Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.4976086Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.4976614Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.4976953Z 
2025-12-04T12:12:57.4977161Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.4978467Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.4979546Z 
2025-12-04T12:12:57.4979805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.4980413Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4980881Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4981244Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4981777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4982470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4982919Z graph_break []
2025-12-04T12:12:57.4983269Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4984338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4985285Z   warnings.warn(
2025-12-04T12:12:57.4985661Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4986109Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4986437Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4986996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4987672Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4988246Z graph_break []
2025-12-04T12:12:57.4988617Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4989684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4990618Z   warnings.warn(
2025-12-04T12:12:57.4990997Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.4991461Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.4991780Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.4992209Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.4992888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.4993456Z graph_break []
2025-12-04T12:12:57.4993808Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.4994873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.4995818Z   warnings.warn(
2025-12-04T12:12:57.4996762Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml -
2025-12-04T12:12:57.4997861Z =========================== short test summary info ============================
2025-12-04T12:12:57.4999230Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5000426Z 
2025-12-04T12:12:57.5000695Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5002483Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5003538Z 
2025-12-04T12:12:57.5003799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5004490Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5005010Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.5005430Z Got exit code 1
2025-12-04T12:12:57.5005696Z Retrying single test...
2025-12-04T12:12:57.5006558Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml
2025-12-04T12:12:57.5007475Z ============================= test session starts ==============================
2025-12-04T12:12:57.5008155Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5008746Z cachedir: .pytest_cache
2025-12-04T12:12:57.5009437Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5010200Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5010531Z configfile: pytest.ini
2025-12-04T12:12:57.5011294Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5012222Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5013582Z stepcurrent: skipping 26 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5014833Z Running 1 items in this shard
2025-12-04T12:12:57.5015054Z 
2025-12-04T12:12:57.5015953Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5584s] [100%]
2025-12-04T12:12:57.5017875Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1657s] [100%]
2025-12-04T12:12:57.5019712Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1596s] [100%]
2025-12-04T12:12:57.5020651Z 
2025-12-04T12:12:57.5020789Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5021624Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5022431Z Traceback (most recent call last):
2025-12-04T12:12:57.5023127Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5066291Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5067025Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5067367Z 
2025-12-04T12:12:57.5067615Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5068897Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5069981Z 
2025-12-04T12:12:57.5070246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5071031Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5071515Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5071837Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5072388Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5073086Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5073541Z graph_break []
2025-12-04T12:12:57.5073936Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5075017Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5076012Z   warnings.warn(
2025-12-04T12:12:57.5076714Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5077527Z Traceback (most recent call last):
2025-12-04T12:12:57.5078267Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5079041Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5079574Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5079919Z 
2025-12-04T12:12:57.5080129Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5081403Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5082545Z 
2025-12-04T12:12:57.5082823Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5083433Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5083897Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5084231Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5084765Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5085455Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5085909Z graph_break []
2025-12-04T12:12:57.5086256Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5087332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5088280Z   warnings.warn(
2025-12-04T12:12:57.5088652Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5089103Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5089433Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5089860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5090533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5091098Z graph_break []
2025-12-04T12:12:57.5091466Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5092530Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5093467Z   warnings.warn(
2025-12-04T12:12:57.5093776Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5094613Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5095412Z Traceback (most recent call last):
2025-12-04T12:12:57.5096107Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5096939Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5097479Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5097809Z 
2025-12-04T12:12:57.5098020Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5099291Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5100396Z 
2025-12-04T12:12:57.5100657Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5101492Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5102025Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5102357Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5102904Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5103597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5104082Z graph_break []
2025-12-04T12:12:57.5104447Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5105520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5106451Z   warnings.warn(
2025-12-04T12:12:57.5106832Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5107297Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5107615Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5108038Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5108719Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5109287Z graph_break []
2025-12-04T12:12:57.5109644Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5110711Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5111645Z   warnings.warn(
2025-12-04T12:12:57.5112001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5112449Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5112771Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5113191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5113858Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5114432Z graph_break []
2025-12-04T12:12:57.5114780Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5115828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5116770Z   warnings.warn(
2025-12-04T12:12:57.5117727Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml -
2025-12-04T12:12:57.5118811Z =========================== short test summary info ============================
2025-12-04T12:12:57.5120171Z FAILED [0.1596s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5121384Z 
2025-12-04T12:12:57.5121585Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5122977Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5124026Z 
2025-12-04T12:12:57.5124294Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5124863Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5125367Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.5125805Z Got exit code 1
2025-12-04T12:12:57.5126857Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5128220Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5129402Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml
2025-12-04T12:12:57.5130314Z ============================= test session starts ==============================
2025-12-04T12:12:57.5130980Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5131545Z cachedir: .pytest_cache
2025-12-04T12:12:57.5132218Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5132966Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5133294Z configfile: pytest.ini
2025-12-04T12:12:57.5134021Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5134931Z collecting ... collected 380 items / 27 deselected / 353 selected
2025-12-04T12:12:57.5135401Z stepcurrent: skipping 27 already run items.
2025-12-04T12:12:57.5135755Z Running 148 items in this shard
2025-12-04T12:12:57.5135959Z 
2025-12-04T12:12:57.5136847Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5578s] [  0%]
2025-12-04T12:12:57.5138782Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1666s] [  0%]
2025-12-04T12:12:57.5140589Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1637s] [  0%]
2025-12-04T12:12:57.5141515Z 
2025-12-04T12:12:57.5141655Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5142459Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5143244Z Traceback (most recent call last):
2025-12-04T12:12:57.5143927Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5144700Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5145207Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5145540Z 
2025-12-04T12:12:57.5145741Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5146991Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5148037Z 
2025-12-04T12:12:57.5148297Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5148890Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5149332Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5149678Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5150195Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5150864Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5151291Z graph_break []
2025-12-04T12:12:57.5151635Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5154652Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5157605Z   return x.grad, w.grad
2025-12-04T12:12:57.5158478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5159404Z   warnings.warn(
2025-12-04T12:12:57.5162287Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5165140Z   return x.grad, w.grad
2025-12-04T12:12:57.5165852Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5166637Z Traceback (most recent call last):
2025-12-04T12:12:57.5167307Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5168071Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5168580Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5168912Z 
2025-12-04T12:12:57.5169114Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5170366Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5171410Z 
2025-12-04T12:12:57.5171671Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5172267Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5172718Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5173032Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5173559Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5174220Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5174650Z graph_break []
2025-12-04T12:12:57.5174998Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5178025Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5180854Z   return x.grad, w.grad
2025-12-04T12:12:57.5181739Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5182700Z   warnings.warn(
2025-12-04T12:12:57.5185476Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5188378Z   return x.grad, w.grad
2025-12-04T12:12:57.5188745Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5189187Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5189496Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5189905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5190569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5191118Z graph_break []
2025-12-04T12:12:57.5191461Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5194434Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5197269Z   return x.grad, w.grad
2025-12-04T12:12:57.5198152Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5199083Z   warnings.warn(
2025-12-04T12:12:57.5202070Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5204985Z   return x.grad, w.grad
2025-12-04T12:12:57.5205284Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5206104Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5206893Z Traceback (most recent call last):
2025-12-04T12:12:57.5207561Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5208334Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5208855Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5209180Z 
2025-12-04T12:12:57.5209467Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5210714Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5211766Z 
2025-12-04T12:12:57.5212020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5212658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5213106Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5213414Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5213981Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5214648Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5215081Z graph_break []
2025-12-04T12:12:57.5215422Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5218453Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5221317Z   return x.grad, w.grad
2025-12-04T12:12:57.5222196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5223125Z   warnings.warn(
2025-12-04T12:12:57.5225915Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5228760Z   return x.grad, w.grad
2025-12-04T12:12:57.5229141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5229589Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5229893Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5230305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5230968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5231520Z graph_break []
2025-12-04T12:12:57.5231860Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5234856Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5237717Z   return x.grad, w.grad
2025-12-04T12:12:57.5238649Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5239602Z   warnings.warn(
2025-12-04T12:12:57.5242491Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5245391Z   return x.grad, w.grad
2025-12-04T12:12:57.5245785Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5246257Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5246578Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5247043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5247728Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5248297Z graph_break []
2025-12-04T12:12:57.5248649Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5249722Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5250674Z   warnings.warn(
2025-12-04T12:12:57.5253485Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5256350Z   return x.grad, w.grad
2025-12-04T12:12:57.5257326Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml -
2025-12-04T12:12:57.5258436Z =========================== short test summary info ============================
2025-12-04T12:12:57.5259802Z FAILED [0.1637s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5260986Z 
2025-12-04T12:12:57.5261209Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5262467Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5263523Z 
2025-12-04T12:12:57.5263783Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5264359Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5264866Z ================== 1 failed, 27 deselected, 2 rerun in 4.94s ===================
2025-12-04T12:12:57.5265281Z Got exit code 1
2025-12-04T12:12:57.5265537Z Retrying single test...
2025-12-04T12:12:57.5266345Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml
2025-12-04T12:12:57.5267255Z ============================= test session starts ==============================
2025-12-04T12:12:57.5267936Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5268523Z cachedir: .pytest_cache
2025-12-04T12:12:57.5269212Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5269958Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5270303Z configfile: pytest.ini
2025-12-04T12:12:57.5271111Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5272046Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5273390Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5274667Z Running 1 items in this shard
2025-12-04T12:12:57.5274905Z 
2025-12-04T12:12:57.5275804Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5914s] [100%]
2025-12-04T12:12:57.5277704Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1669s] [100%]
2025-12-04T12:12:57.5279528Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1648s] [100%]
2025-12-04T12:12:57.5280485Z 
2025-12-04T12:12:57.5280623Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5281448Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5282322Z Traceback (most recent call last):
2025-12-04T12:12:57.5283007Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5283799Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5284341Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5284673Z 
2025-12-04T12:12:57.5284900Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5286159Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5287216Z 
2025-12-04T12:12:57.5287476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5288096Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5288564Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5288881Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5289424Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5290106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5290539Z graph_break []
2025-12-04T12:12:57.5290900Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5293944Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5296806Z   return x.grad, w.grad
2025-12-04T12:12:57.5297704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5298641Z   warnings.warn(
2025-12-04T12:12:57.5301729Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5304693Z   return x.grad, w.grad
2025-12-04T12:12:57.5305428Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5306234Z Traceback (most recent call last):
2025-12-04T12:12:57.5306916Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5307703Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5308240Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5308571Z 
2025-12-04T12:12:57.5308791Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5310045Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5311102Z 
2025-12-04T12:12:57.5311365Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5311976Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5312438Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5312749Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5313283Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5313966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5314398Z graph_break []
2025-12-04T12:12:57.5314764Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5317759Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5320621Z   return x.grad, w.grad
2025-12-04T12:12:57.5321522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5322516Z   warnings.warn(
2025-12-04T12:12:57.5325370Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5328230Z   return x.grad, w.grad
2025-12-04T12:12:57.5328627Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5329090Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5329405Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5329942Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5330623Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5331176Z graph_break []
2025-12-04T12:12:57.5331575Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5334581Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5337473Z   return x.grad, w.grad
2025-12-04T12:12:57.5338371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5339307Z   warnings.warn(
2025-12-04T12:12:57.5342124Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5344985Z   return x.grad, w.grad
2025-12-04T12:12:57.5345315Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5346139Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5346946Z Traceback (most recent call last):
2025-12-04T12:12:57.5347641Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5348436Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5348961Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5349308Z 
2025-12-04T12:12:57.5349519Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5350789Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5351842Z 
2025-12-04T12:12:57.5352117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5352717Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5353181Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5353510Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5354052Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5354727Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5355176Z graph_break []
2025-12-04T12:12:57.5355568Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5358590Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5361471Z   return x.grad, w.grad
2025-12-04T12:12:57.5362420Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5363412Z   warnings.warn(
2025-12-04T12:12:57.5366230Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5369091Z   return x.grad, w.grad
2025-12-04T12:12:57.5369471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5369935Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5370267Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5370676Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5371361Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5371929Z graph_break []
2025-12-04T12:12:57.5372293Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5375291Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5378157Z   return x.grad, w.grad
2025-12-04T12:12:57.5379061Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5380011Z   warnings.warn(
2025-12-04T12:12:57.5382807Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5385655Z   return x.grad, w.grad
2025-12-04T12:12:57.5386035Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5386554Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5386883Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5387299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5387979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5388545Z graph_break []
2025-12-04T12:12:57.5388916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5390010Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5390958Z   warnings.warn(
2025-12-04T12:12:57.5393790Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5396674Z   return x.grad, w.grad
2025-12-04T12:12:57.5397663Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml -
2025-12-04T12:12:57.5398759Z =========================== short test summary info ============================
2025-12-04T12:12:57.5400129Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5401529Z 
2025-12-04T12:12:57.5401747Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5403075Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5404125Z 
2025-12-04T12:12:57.5404385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5404962Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5405478Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ==================
2025-12-04T12:12:57.5405916Z Got exit code 1
2025-12-04T12:12:57.5406167Z Retrying single test...
2025-12-04T12:12:57.5406981Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml
2025-12-04T12:12:57.5407897Z ============================= test session starts ==============================
2025-12-04T12:12:57.5408535Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5409120Z cachedir: .pytest_cache
2025-12-04T12:12:57.5409811Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5410574Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5410903Z configfile: pytest.ini
2025-12-04T12:12:57.5411658Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5412588Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5413949Z stepcurrent: skipping 27 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5415255Z Running 1 items in this shard
2025-12-04T12:12:57.5415476Z 
2025-12-04T12:12:57.5416368Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5656s] [100%]
2025-12-04T12:12:57.5418320Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1673s] [100%]
2025-12-04T12:12:57.5420133Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1649s] [100%]
2025-12-04T12:12:57.5421111Z 
2025-12-04T12:12:57.5421259Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5422077Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5422919Z Traceback (most recent call last):
2025-12-04T12:12:57.5423616Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5424394Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5424923Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5425264Z 
2025-12-04T12:12:57.5425475Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5426746Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5427797Z 
2025-12-04T12:12:57.5428069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5428670Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5429141Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5429471Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5430001Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5430682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5431125Z graph_break []
2025-12-04T12:12:57.5431479Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5434484Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5437359Z   return x.grad, w.grad
2025-12-04T12:12:57.5438265Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5439214Z   warnings.warn(
2025-12-04T12:12:57.5442064Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5445097Z   return x.grad, w.grad
2025-12-04T12:12:57.5445912Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5446741Z Traceback (most recent call last):
2025-12-04T12:12:57.5447442Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5448273Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5448814Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5449147Z 
2025-12-04T12:12:57.5449372Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5450676Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5451768Z 
2025-12-04T12:12:57.5452025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5452647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5453117Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5453447Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5453984Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5454672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5455118Z graph_break []
2025-12-04T12:12:57.5455470Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5458478Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5461356Z   return x.grad, w.grad
2025-12-04T12:12:57.5462262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5463213Z   warnings.warn(
2025-12-04T12:12:57.5465999Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5468854Z   return x.grad, w.grad
2025-12-04T12:12:57.5469246Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5469709Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5470026Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5470454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5471139Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5471708Z graph_break []
2025-12-04T12:12:57.5472059Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5475123Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5477992Z   return x.grad, w.grad
2025-12-04T12:12:57.5478898Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5479888Z   warnings.warn(
2025-12-04T12:12:57.5482742Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5485658Z   return x.grad, w.grad
2025-12-04T12:12:57.5485986Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5486819Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.5487624Z Traceback (most recent call last):
2025-12-04T12:12:57.5488300Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5489085Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5489622Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5489951Z 
2025-12-04T12:12:57.5490163Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5491427Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5492494Z 
2025-12-04T12:12:57.5492752Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5493366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5493824Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5494157Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5494702Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5495394Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5495834Z graph_break []
2025-12-04T12:12:57.5496195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5499200Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5502269Z   return x.grad, w.grad
2025-12-04T12:12:57.5503251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5504213Z   warnings.warn(
2025-12-04T12:12:57.5507064Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5509960Z   return x.grad, w.grad
2025-12-04T12:12:57.5510361Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5510815Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5511153Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5511634Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5512308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5512875Z graph_break []
2025-12-04T12:12:57.5513241Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5516240Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5519109Z   return x.grad, w.grad
2025-12-04T12:12:57.5519989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5520935Z   warnings.warn(
2025-12-04T12:12:57.5523787Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5526639Z   return x.grad, w.grad
2025-12-04T12:12:57.5527032Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5527486Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5527812Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5528241Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5528910Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5529475Z graph_break []
2025-12-04T12:12:57.5529845Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5530924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5531861Z   warnings.warn(
2025-12-04T12:12:57.5534725Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.5537595Z   return x.grad, w.grad
2025-12-04T12:12:57.5538607Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml -
2025-12-04T12:12:57.5539715Z =========================== short test summary info ============================
2025-12-04T12:12:57.5541101Z FAILED [0.1649s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5542324Z 
2025-12-04T12:12:57.5542534Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5543805Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5544859Z 
2025-12-04T12:12:57.5545134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5545705Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5546218Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.5546657Z Got exit code 1
2025-12-04T12:12:57.5547652Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.5549023Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5550189Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml
2025-12-04T12:12:57.5551113Z ============================= test session starts ==============================
2025-12-04T12:12:57.5551771Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5552347Z cachedir: .pytest_cache
2025-12-04T12:12:57.5553045Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5553170Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5553295Z configfile: pytest.ini
2025-12-04T12:12:57.5553874Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5554114Z collecting ... collected 380 items / 28 deselected / 352 selected
2025-12-04T12:12:57.5554254Z stepcurrent: skipping 28 already run items.
2025-12-04T12:12:57.5554367Z Running 147 items in this shard
2025-12-04T12:12:57.5554372Z 
2025-12-04T12:12:57.5555406Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.5556400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.5557461Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.5558465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.5559401Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5359s] [  3%]
2025-12-04T12:12:57.5560290Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1634s] [  3%]
2025-12-04T12:12:57.5561135Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [  3%]
2025-12-04T12:12:57.5561184Z 
2025-12-04T12:12:57.5561327Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5561873Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5562007Z Traceback (most recent call last):
2025-12-04T12:12:57.5562521Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5562722Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5562943Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5562949Z 
2025-12-04T12:12:57.5563162Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5564101Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5564109Z 
2025-12-04T12:12:57.5564372Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5564589Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5564714Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5564827Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5565178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5565395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5565491Z graph_break []
2025-12-04T12:12:57.5565719Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5566445Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5566547Z   warnings.warn(
2025-12-04T12:12:57.5567109Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5567227Z Traceback (most recent call last):
2025-12-04T12:12:57.5567698Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5567890Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5568097Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5568102Z 
2025-12-04T12:12:57.5568321Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5569245Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5569255Z 
2025-12-04T12:12:57.5569564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5569780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5569890Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5570015Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5570350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5570578Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5570703Z graph_break []
2025-12-04T12:12:57.5570916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5571648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5571794Z   warnings.warn(
2025-12-04T12:12:57.5572003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5572128Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5572287Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5572513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5572847Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5572943Z graph_break []
2025-12-04T12:12:57.5573167Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5573880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5573976Z   warnings.warn(
2025-12-04T12:12:57.5574131Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5574680Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5574814Z Traceback (most recent call last):
2025-12-04T12:12:57.5575277Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5575470Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5575689Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5575694Z 
2025-12-04T12:12:57.5575900Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5576842Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5576850Z 
2025-12-04T12:12:57.5577108Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5577315Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5577436Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5577552Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5577881Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5578106Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5578201Z graph_break []
2025-12-04T12:12:57.5578420Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5579137Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5579234Z   warnings.warn(
2025-12-04T12:12:57.5579455Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5579567Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5579678Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5579904Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5580266Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5580377Z graph_break []
2025-12-04T12:12:57.5580586Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5581297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5581438Z   warnings.warn(
2025-12-04T12:12:57.5581650Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5581757Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5581881Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5582130Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5582470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5582565Z graph_break []
2025-12-04T12:12:57.5582806Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5583533Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5583631Z   warnings.warn(
2025-12-04T12:12:57.5584435Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml -
2025-12-04T12:12:57.5584615Z =========================== short test summary info ============================
2025-12-04T12:12:57.5585677Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5585685Z 
2025-12-04T12:12:57.5585910Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5586844Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5586850Z 
2025-12-04T12:12:57.5587125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5587301Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5587512Z ============= 1 failed, 4 skipped, 28 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.5587620Z Got exit code 1
2025-12-04T12:12:57.5587725Z Retrying single test...
2025-12-04T12:12:57.5588355Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml
2025-12-04T12:12:57.5588526Z ============================= test session starts ==============================
2025-12-04T12:12:57.5588870Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5588990Z cachedir: .pytest_cache
2025-12-04T12:12:57.5589497Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5589617Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5589734Z configfile: pytest.ini
2025-12-04T12:12:57.5590309Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5590547Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5591558Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5591706Z Running 1 items in this shard
2025-12-04T12:12:57.5591713Z 
2025-12-04T12:12:57.5592617Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5451s] [100%]
2025-12-04T12:12:57.5593542Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%]
2025-12-04T12:12:57.5594370Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1597s] [100%]
2025-12-04T12:12:57.5594408Z 
2025-12-04T12:12:57.5594549Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5595112Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5595261Z Traceback (most recent call last):
2025-12-04T12:12:57.5595725Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5595932Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5596138Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5596143Z 
2025-12-04T12:12:57.5596353Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5597289Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5597297Z 
2025-12-04T12:12:57.5597559Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5597784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5597901Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5598014Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5598358Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5598574Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5598682Z graph_break []
2025-12-04T12:12:57.5598894Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5599616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5599732Z   warnings.warn(
2025-12-04T12:12:57.5600284Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5600403Z Traceback (most recent call last):
2025-12-04T12:12:57.5601081Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5601283Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5601503Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5601508Z 
2025-12-04T12:12:57.5601719Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5602700Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5602720Z 
2025-12-04T12:12:57.5602983Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5603199Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5603324Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5603436Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5603843Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5604075Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5604171Z graph_break []
2025-12-04T12:12:57.5604383Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5605159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5605259Z   warnings.warn(
2025-12-04T12:12:57.5605480Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5605627Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5605737Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5605961Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5606295Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5606430Z graph_break []
2025-12-04T12:12:57.5606650Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5607364Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5607474Z   warnings.warn(
2025-12-04T12:12:57.5607618Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5608171Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5608303Z Traceback (most recent call last):
2025-12-04T12:12:57.5608759Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5608965Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5609174Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5609179Z 
2025-12-04T12:12:57.5609388Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5610341Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5610347Z 
2025-12-04T12:12:57.5610606Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5610824Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5610936Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5611048Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5611391Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5611607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5611705Z graph_break []
2025-12-04T12:12:57.5611925Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5612641Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5612751Z   warnings.warn(
2025-12-04T12:12:57.5612960Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5613071Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5613194Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5613409Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5613740Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5613853Z graph_break []
2025-12-04T12:12:57.5614063Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5614823Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5614924Z   warnings.warn(
2025-12-04T12:12:57.5615130Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5615255Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5615371Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5615632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5615979Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5616120Z graph_break []
2025-12-04T12:12:57.5616328Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5617055Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5617187Z   warnings.warn(
2025-12-04T12:12:57.5618001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml -
2025-12-04T12:12:57.5618165Z =========================== short test summary info ============================
2025-12-04T12:12:57.5619229Z FAILED [0.1597s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5619249Z 
2025-12-04T12:12:57.5619461Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5620384Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5620391Z 
2025-12-04T12:12:57.5620662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5620835Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5621042Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.5621135Z Got exit code 1
2025-12-04T12:12:57.5621241Z Retrying single test...
2025-12-04T12:12:57.5621883Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml
2025-12-04T12:12:57.5622044Z ============================= test session starts ==============================
2025-12-04T12:12:57.5622385Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5622499Z cachedir: .pytest_cache
2025-12-04T12:12:57.5623004Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5623138Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5623245Z configfile: pytest.ini
2025-12-04T12:12:57.5623819Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5624050Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5625053Z stepcurrent: skipping 32 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5625178Z Running 1 items in this shard
2025-12-04T12:12:57.5625183Z 
2025-12-04T12:12:57.5626103Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5386s] [100%]
2025-12-04T12:12:57.5626992Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [100%]
2025-12-04T12:12:57.5627841Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1567s] [100%]
2025-12-04T12:12:57.5627847Z 
2025-12-04T12:12:57.5627984Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5628546Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5628695Z Traceback (most recent call last):
2025-12-04T12:12:57.5629160Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5629396Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5629602Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5629607Z 
2025-12-04T12:12:57.5629825Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5630753Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5630758Z 
2025-12-04T12:12:57.5631026Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5631240Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5631352Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5631476Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5631806Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5632022Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5632132Z graph_break []
2025-12-04T12:12:57.5632341Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5633072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5633175Z   warnings.warn(
2025-12-04T12:12:57.5633722Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5633850Z Traceback (most recent call last):
2025-12-04T12:12:57.5634307Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5634500Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5634722Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5634729Z 
2025-12-04T12:12:57.5634934Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5635871Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5635876Z 
2025-12-04T12:12:57.5636135Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5636348Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5636471Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5636586Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5636928Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5637139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5637239Z graph_break []
2025-12-04T12:12:57.5637491Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5638211Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5638308Z   warnings.warn(
2025-12-04T12:12:57.5638529Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5638663Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5638785Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5638999Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5639358Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5639467Z graph_break []
2025-12-04T12:12:57.5639677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5640390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5640528Z   warnings.warn(
2025-12-04T12:12:57.5640669Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5641233Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5641353Z Traceback (most recent call last):
2025-12-04T12:12:57.5641808Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5642012Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5642287Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5642295Z 
2025-12-04T12:12:57.5642504Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5643450Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5643457Z 
2025-12-04T12:12:57.5643714Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5643941Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5644052Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5644164Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5644508Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5644723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5644835Z graph_break []
2025-12-04T12:12:57.5645045Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5645757Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5645869Z   warnings.warn(
2025-12-04T12:12:57.5646075Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5646182Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5646303Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5646517Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5646855Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
﻿2025-12-04T12:12:57.5650025Z graph_break []
2025-12-04T12:12:57.5650246Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5650981Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5651080Z   warnings.warn(
2025-12-04T12:12:57.5651359Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5651482Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5651593Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5651807Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5652153Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5652281Z graph_break []
2025-12-04T12:12:57.5652496Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5653217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5653350Z   warnings.warn(
2025-12-04T12:12:57.5654170Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml -
2025-12-04T12:12:57.5654371Z =========================== short test summary info ============================
2025-12-04T12:12:57.5655434Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5655441Z 
2025-12-04T12:12:57.5655666Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5656598Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5656606Z 
2025-12-04T12:12:57.5656876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5657050Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5657248Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.5657356Z Got exit code 1
2025-12-04T12:12:57.5658195Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5658605Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5659229Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml
2025-12-04T12:12:57.5659392Z ============================= test session starts ==============================
2025-12-04T12:12:57.5659743Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5659848Z cachedir: .pytest_cache
2025-12-04T12:12:57.5660366Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5660494Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5660601Z configfile: pytest.ini
2025-12-04T12:12:57.5661190Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5661415Z collecting ... collected 380 items / 33 deselected / 347 selected
2025-12-04T12:12:57.5661556Z stepcurrent: skipping 33 already run items.
2025-12-04T12:12:57.5661679Z Running 142 items in this shard
2025-12-04T12:12:57.5661794Z 
2025-12-04T12:12:57.5662690Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5453s] [  0%]
2025-12-04T12:12:57.5663626Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [  0%]
2025-12-04T12:12:57.5664442Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1587s] [  0%]
2025-12-04T12:12:57.5664447Z 
2025-12-04T12:12:57.5664633Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5665186Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5665306Z Traceback (most recent call last):
2025-12-04T12:12:57.5665780Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5665974Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5666185Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5666239Z 
2025-12-04T12:12:57.5666449Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5667371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5667376Z 
2025-12-04T12:12:57.5667653Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5667868Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5667993Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5668108Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5668437Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5668664Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5668761Z graph_break []
2025-12-04T12:12:57.5668972Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5669700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5669799Z   warnings.warn(
2025-12-04T12:12:57.5670362Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5670480Z Traceback (most recent call last):
2025-12-04T12:12:57.5670938Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5671146Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5671351Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5671356Z 
2025-12-04T12:12:57.5671566Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5672507Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5672513Z 
2025-12-04T12:12:57.5672773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5673000Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5673110Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5673223Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5673626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5673843Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5673952Z graph_break []
2025-12-04T12:12:57.5674161Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5674920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5675036Z   warnings.warn(
2025-12-04T12:12:57.5675247Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5675357Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5675485Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5675728Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5676077Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5676178Z graph_break []
2025-12-04T12:12:57.5676391Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5677118Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5677252Z   warnings.warn(
2025-12-04T12:12:57.5677393Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5677958Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5678080Z Traceback (most recent call last):
2025-12-04T12:12:57.5678559Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5678755Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5678966Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5678971Z 
2025-12-04T12:12:57.5679196Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5680122Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5680129Z 
2025-12-04T12:12:57.5680404Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5680617Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5680727Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5680853Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5681185Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5681403Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5681516Z graph_break []
2025-12-04T12:12:57.5681727Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5682551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5682655Z   warnings.warn(
2025-12-04T12:12:57.5682863Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5682988Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5683100Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5683316Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5683660Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5683758Z graph_break []
2025-12-04T12:12:57.5683982Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5684741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5684837Z   warnings.warn(
2025-12-04T12:12:57.5685060Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5685209Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5685322Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5685545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5685872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5685981Z graph_break []
2025-12-04T12:12:57.5686187Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5686924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5687035Z   warnings.warn(
2025-12-04T12:12:57.5687840Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml -
2025-12-04T12:12:57.5688018Z =========================== short test summary info ============================
2025-12-04T12:12:57.5689115Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5689121Z 
2025-12-04T12:12:57.5689330Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5690270Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5690278Z 
2025-12-04T12:12:57.5690539Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5690727Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5690917Z ================== 1 failed, 33 deselected, 2 rerun in 4.92s ===================
2025-12-04T12:12:57.5691016Z Got exit code 1
2025-12-04T12:12:57.5691136Z Retrying single test...
2025-12-04T12:12:57.5691758Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml
2025-12-04T12:12:57.5691931Z ============================= test session starts ==============================
2025-12-04T12:12:57.5692279Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5692388Z cachedir: .pytest_cache
2025-12-04T12:12:57.5692909Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5693036Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5693144Z configfile: pytest.ini
2025-12-04T12:12:57.5693732Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5693961Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5694982Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5695095Z Running 1 items in this shard
2025-12-04T12:12:57.5695100Z 
2025-12-04T12:12:57.5695987Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5542s] [100%]
2025-12-04T12:12:57.5696927Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1622s] [100%]
2025-12-04T12:12:57.5697771Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [100%]
2025-12-04T12:12:57.5697779Z 
2025-12-04T12:12:57.5697929Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5698478Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5698609Z Traceback (most recent call last):
2025-12-04T12:12:57.5699116Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5699312Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5699534Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5699539Z 
2025-12-04T12:12:57.5699747Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5700685Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5700735Z 
2025-12-04T12:12:57.5701175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5701389Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5701514Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5701631Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5701964Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5702199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5702298Z graph_break []
2025-12-04T12:12:57.5702527Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5703251Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5703355Z   warnings.warn(
2025-12-04T12:12:57.5703916Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5704037Z Traceback (most recent call last):
2025-12-04T12:12:57.5704495Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5704707Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5704912Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5704920Z 
2025-12-04T12:12:57.5705142Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5706071Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5706080Z 
2025-12-04T12:12:57.5706350Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5706566Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5706676Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5706804Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5707138Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5707353Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5707462Z graph_break []
2025-12-04T12:12:57.5707761Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5708494Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5708597Z   warnings.warn(
2025-12-04T12:12:57.5708853Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5708979Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5709090Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5709304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5709650Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5709749Z graph_break []
2025-12-04T12:12:57.5710000Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5710726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5710827Z   warnings.warn(
2025-12-04T12:12:57.5710980Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5711532Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5711692Z Traceback (most recent call last):
2025-12-04T12:12:57.5712168Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5712363Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5712582Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5712586Z 
2025-12-04T12:12:57.5712797Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5713730Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5713738Z 
2025-12-04T12:12:57.5714010Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5714222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5714344Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5714459Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5714789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5715011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5715106Z graph_break []
2025-12-04T12:12:57.5715318Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5716040Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5716140Z   warnings.warn(
2025-12-04T12:12:57.5716359Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5716466Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5716578Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5716806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5717133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5717226Z graph_break []
2025-12-04T12:12:57.5717446Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5718162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5718270Z   warnings.warn(
2025-12-04T12:12:57.5718514Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5718621Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5718745Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5718959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5719315Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5719427Z graph_break []
2025-12-04T12:12:57.5719637Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5720360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5720458Z   warnings.warn(
2025-12-04T12:12:57.5721284Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml -
2025-12-04T12:12:57.5721469Z =========================== short test summary info ============================
2025-12-04T12:12:57.5722603Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5722645Z 
2025-12-04T12:12:57.5722874Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5723797Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5723803Z 
2025-12-04T12:12:57.5724062Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5724255Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5724451Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.5724559Z Got exit code 1
2025-12-04T12:12:57.5724662Z Retrying single test...
2025-12-04T12:12:57.5725283Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml
2025-12-04T12:12:57.5725457Z ============================= test session starts ==============================
2025-12-04T12:12:57.5725797Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5725904Z cachedir: .pytest_cache
2025-12-04T12:12:57.5726424Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5726545Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5726663Z configfile: pytest.ini
2025-12-04T12:12:57.5727236Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5727460Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5728490Z stepcurrent: skipping 33 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5728605Z Running 1 items in this shard
2025-12-04T12:12:57.5728611Z 
2025-12-04T12:12:57.5729509Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5321s] [100%]
2025-12-04T12:12:57.5730393Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%]
2025-12-04T12:12:57.5731259Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1572s] [100%]
2025-12-04T12:12:57.5731264Z 
2025-12-04T12:12:57.5731413Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5731995Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5732127Z Traceback (most recent call last):
2025-12-04T12:12:57.5732589Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5732800Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5733042Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5733047Z 
2025-12-04T12:12:57.5733259Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5734214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5734219Z 
2025-12-04T12:12:57.5734481Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5734742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5734854Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5734969Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5735317Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5735531Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5735629Z graph_break []
2025-12-04T12:12:57.5735858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5736584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5736700Z   warnings.warn(
2025-12-04T12:12:57.5737255Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5737376Z Traceback (most recent call last):
2025-12-04T12:12:57.5737854Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5738047Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5738256Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5738260Z 
2025-12-04T12:12:57.5738485Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5739405Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5739412Z 
2025-12-04T12:12:57.5739687Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5739901Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5740018Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5740149Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5740481Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5740714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5740814Z graph_break []
2025-12-04T12:12:57.5741026Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5741762Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5741902Z   warnings.warn(
2025-12-04T12:12:57.5742111Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5742231Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5742345Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5742604Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5742938Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5743033Z graph_break []
2025-12-04T12:12:57.5743259Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5744022Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5744127Z   warnings.warn(
2025-12-04T12:12:57.5744284Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5744841Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5744976Z Traceback (most recent call last):
2025-12-04T12:12:57.5745436Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5745661Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5745877Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5745882Z 
2025-12-04T12:12:57.5746087Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5747028Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5747033Z 
2025-12-04T12:12:57.5747289Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5747500Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5747621Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5747733Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5748062Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5748287Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5748379Z graph_break []
2025-12-04T12:12:57.5748598Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5749316Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5749410Z   warnings.warn(
2025-12-04T12:12:57.5749632Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5749742Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5749853Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5750079Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5750405Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5750517Z graph_break []
2025-12-04T12:12:57.5750728Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5751440Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5751551Z   warnings.warn(
2025-12-04T12:12:57.5751758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5751868Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5751990Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5752206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5752586Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5752684Z graph_break []
2025-12-04T12:12:57.5752892Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5753657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5753756Z   warnings.warn(
2025-12-04T12:12:57.5754566Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml -
2025-12-04T12:12:57.5754764Z =========================== short test summary info ============================
2025-12-04T12:12:57.5755830Z FAILED [0.1572s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5755838Z 
2025-12-04T12:12:57.5756061Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5756986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5757023Z 
2025-12-04T12:12:57.5757299Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5757474Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5757668Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.5757778Z Got exit code 1
2025-12-04T12:12:57.5758621Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5759036Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5759664Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml
2025-12-04T12:12:57.5759833Z ============================= test session starts ==============================
2025-12-04T12:12:57.5760186Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5760293Z cachedir: .pytest_cache
2025-12-04T12:12:57.5760815Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5760936Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5761042Z configfile: pytest.ini
2025-12-04T12:12:57.5761631Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5761857Z collecting ... collected 380 items / 34 deselected / 346 selected
2025-12-04T12:12:57.5761997Z stepcurrent: skipping 34 already run items.
2025-12-04T12:12:57.5762229Z Running 141 items in this shard
2025-12-04T12:12:57.5762242Z 
2025-12-04T12:12:57.5763261Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.5764171Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5439s] [  1%]
2025-12-04T12:12:57.5765049Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1619s] [  1%]
2025-12-04T12:12:57.5765919Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1564s] [  1%]
2025-12-04T12:12:57.5765927Z 
2025-12-04T12:12:57.5766096Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5766640Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5766773Z Traceback (most recent call last):
2025-12-04T12:12:57.5767233Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5767471Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5767679Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5767687Z 
2025-12-04T12:12:57.5767894Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5768828Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5768866Z 
2025-12-04T12:12:57.5769130Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5769355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5769466Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5769580Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5769925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5770139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5770234Z graph_break []
2025-12-04T12:12:57.5770460Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5771177Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5771288Z   warnings.warn(
2025-12-04T12:12:57.5771837Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5771953Z Traceback (most recent call last):
2025-12-04T12:12:57.5772422Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5772614Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5772820Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5772839Z 
2025-12-04T12:12:57.5773045Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5773963Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5773969Z 
2025-12-04T12:12:57.5774238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5774455Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5774564Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5774687Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5775016Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5775242Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5775338Z graph_break []
2025-12-04T12:12:57.5775548Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5776272Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5776415Z   warnings.warn(
2025-12-04T12:12:57.5776624Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5776772Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5776890Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5777117Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5777444Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5777539Z graph_break []
2025-12-04T12:12:57.5777760Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5778504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5778607Z   warnings.warn(
2025-12-04T12:12:57.5778760Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5779307Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5779437Z Traceback (most recent call last):
2025-12-04T12:12:57.5779934Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5780126Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5780344Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5780349Z 
2025-12-04T12:12:57.5780557Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5781493Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5781501Z 
2025-12-04T12:12:57.5781764Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5781974Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5782094Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5782212Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5782554Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5782770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5782865Z graph_break []
2025-12-04T12:12:57.5783084Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5783796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5783894Z   warnings.warn(
2025-12-04T12:12:57.5784118Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5784227Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5784352Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5784564Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5784898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5822241Z graph_break []
2025-12-04T12:12:57.5822640Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5823377Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5823517Z   warnings.warn(
2025-12-04T12:12:57.5823735Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5823846Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5824182Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5824406Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5824758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5824855Z graph_break []
2025-12-04T12:12:57.5825142Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5825874Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5825974Z   warnings.warn(
2025-12-04T12:12:57.5826830Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml -
2025-12-04T12:12:57.5827012Z =========================== short test summary info ============================
2025-12-04T12:12:57.5828075Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5828084Z 
2025-12-04T12:12:57.5828316Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5829292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5829299Z 
2025-12-04T12:12:57.5829577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5829759Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5829969Z ============= 1 failed, 1 skipped, 34 deselected, 2 rerun in 4.92s =============
2025-12-04T12:12:57.5830086Z Got exit code 1
2025-12-04T12:12:57.5830194Z Retrying single test...
2025-12-04T12:12:57.5830825Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml
2025-12-04T12:12:57.5831000Z ============================= test session starts ==============================
2025-12-04T12:12:57.5831349Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5831467Z cachedir: .pytest_cache
2025-12-04T12:12:57.5831982Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5832111Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5832230Z configfile: pytest.ini
2025-12-04T12:12:57.5832808Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5833030Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5834044Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5834156Z Running 1 items in this shard
2025-12-04T12:12:57.5834166Z 
2025-12-04T12:12:57.5835063Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5284s] [100%]
2025-12-04T12:12:57.5835950Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%]
2025-12-04T12:12:57.5836765Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1580s] [100%]
2025-12-04T12:12:57.5836807Z 
2025-12-04T12:12:57.5836947Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5837524Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5837661Z Traceback (most recent call last):
2025-12-04T12:12:57.5838127Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5838335Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5838540Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5838545Z 
2025-12-04T12:12:57.5838786Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5839728Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5839736Z 
2025-12-04T12:12:57.5839995Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5840219Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5840378Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5840489Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5840836Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5841050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5841145Z graph_break []
2025-12-04T12:12:57.5841372Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5842102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5842303Z   warnings.warn(
2025-12-04T12:12:57.5842847Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5842965Z Traceback (most recent call last):
2025-12-04T12:12:57.5843579Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5843781Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5843988Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5844009Z 
2025-12-04T12:12:57.5844273Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5845194Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5845201Z 
2025-12-04T12:12:57.5845474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5845687Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5845795Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5845920Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5846256Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5846481Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5846575Z graph_break []
2025-12-04T12:12:57.5846786Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5847520Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5847619Z   warnings.warn(
2025-12-04T12:12:57.5847831Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5848014Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5848127Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5848357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5848721Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5848821Z graph_break []
2025-12-04T12:12:57.5849050Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5849760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5849857Z   warnings.warn(
2025-12-04T12:12:57.5850048Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5850594Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5850729Z Traceback (most recent call last):
2025-12-04T12:12:57.5851188Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5851382Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5851642Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5851648Z 
2025-12-04T12:12:57.5851855Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5852792Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5852797Z 
2025-12-04T12:12:57.5853058Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5853272Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5853398Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5853509Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5853854Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5854068Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5854169Z graph_break []
2025-12-04T12:12:57.5854392Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5855106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5855203Z   warnings.warn(
2025-12-04T12:12:57.5855428Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5855535Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5855661Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5855874Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5856202Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5856311Z graph_break []
2025-12-04T12:12:57.5856526Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5857238Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5857352Z   warnings.warn(
2025-12-04T12:12:57.5857559Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5857679Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5857788Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5858003Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5858352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5858500Z graph_break []
2025-12-04T12:12:57.5858708Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5859459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5859559Z   warnings.warn(
2025-12-04T12:12:57.5860371Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml -
2025-12-04T12:12:57.5860539Z =========================== short test summary info ============================
2025-12-04T12:12:57.5861623Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5861632Z 
2025-12-04T12:12:57.5861858Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5862777Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5862818Z 
2025-12-04T12:12:57.5863093Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5863267Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5863461Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.5863567Z Got exit code 1
2025-12-04T12:12:57.5863669Z Retrying single test...
2025-12-04T12:12:57.5864305Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml
2025-12-04T12:12:57.5864464Z ============================= test session starts ==============================
2025-12-04T12:12:57.5864807Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5864929Z cachedir: .pytest_cache
2025-12-04T12:12:57.5865440Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5865566Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5865683Z configfile: pytest.ini
2025-12-04T12:12:57.5866258Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5866496Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5867500Z stepcurrent: skipping 35 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5867615Z Running 1 items in this shard
2025-12-04T12:12:57.5867620Z 
2025-12-04T12:12:57.5868516Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5563s] [100%]
2025-12-04T12:12:57.5869406Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1610s] [100%]
2025-12-04T12:12:57.5870223Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False FAILED [0.1585s] [100%]
2025-12-04T12:12:57.5870231Z 
2025-12-04T12:12:57.5870368Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5870928Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5871088Z Traceback (most recent call last):
2025-12-04T12:12:57.5871550Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5871789Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5872000Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5872005Z 
2025-12-04T12:12:57.5872228Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5873258Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5873263Z 
2025-12-04T12:12:57.5873522Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5873752Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5873863Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5873973Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5874316Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5874562Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5874672Z graph_break []
2025-12-04T12:12:57.5874882Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5875597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5875710Z   warnings.warn(
2025-12-04T12:12:57.5876255Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5876387Z Traceback (most recent call last):
2025-12-04T12:12:57.5876845Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5877034Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5877253Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5877263Z 
2025-12-04T12:12:57.5877471Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5878397Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5878415Z 
2025-12-04T12:12:57.5878674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5878883Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5879007Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5879122Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5879454Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5879681Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5879776Z graph_break []
2025-12-04T12:12:57.5880003Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5880718Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5880815Z   warnings.warn(
2025-12-04T12:12:57.5881038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5881146Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5881261Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5881484Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5881865Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5881975Z graph_break []
2025-12-04T12:12:57.5882265Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5883016Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5883132Z   warnings.warn(
2025-12-04T12:12:57.5883270Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5883820Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.5884000Z Traceback (most recent call last):
2025-12-04T12:12:57.5884462Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5884669Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5884876Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5884881Z 
2025-12-04T12:12:57.5885087Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5886026Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5886065Z 
2025-12-04T12:12:57.5886328Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5886555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5886664Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5886778Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5887122Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5887339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5887437Z graph_break []
2025-12-04T12:12:57.5887663Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5888380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5888495Z   warnings.warn(
2025-12-04T12:12:57.5888705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5888811Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5888935Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5889151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5889476Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5889584Z graph_break []
2025-12-04T12:12:57.5889796Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5890519Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5890615Z   warnings.warn(
2025-12-04T12:12:57.5890829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5890946Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5891058Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5891271Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5891609Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5891703Z graph_break []
2025-12-04T12:12:57.5891927Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5892637Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5892770Z   warnings.warn(
2025-12-04T12:12:57.5893582Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml -
2025-12-04T12:12:57.5893802Z =========================== short test summary info ============================
2025-12-04T12:12:57.5894871Z FAILED [0.1585s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5894877Z 
2025-12-04T12:12:57.5895119Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5896038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5896045Z 
2025-12-04T12:12:57.5896321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5896499Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5896740Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.5896841Z Got exit code 1
2025-12-04T12:12:57.5897684Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.5898102Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5898727Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml
2025-12-04T12:12:57.5898903Z ============================= test session starts ==============================
2025-12-04T12:12:57.5899248Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5899357Z cachedir: .pytest_cache
2025-12-04T12:12:57.5899880Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5900005Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5900116Z configfile: pytest.ini
2025-12-04T12:12:57.5900705Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5901189Z collecting ... collected 380 items / 36 deselected / 344 selected
2025-12-04T12:12:57.5901350Z stepcurrent: skipping 36 already run items.
2025-12-04T12:12:57.5901462Z Running 139 items in this shard
2025-12-04T12:12:57.5901468Z 
2025-12-04T12:12:57.5902479Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.5903384Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5489s] [  1%]
2025-12-04T12:12:57.5904271Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1604s] [  1%]
2025-12-04T12:12:57.5905095Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [  1%]
2025-12-04T12:12:57.5905101Z 
2025-12-04T12:12:57.5905347Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5905911Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5906031Z Traceback (most recent call last):
2025-12-04T12:12:57.5906540Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5906752Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5906964Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5906969Z 
2025-12-04T12:12:57.5907190Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5908166Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5908174Z 
2025-12-04T12:12:57.5908438Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5908665Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5908774Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5908902Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5909276Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5909492Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5909608Z graph_break []
2025-12-04T12:12:57.5909818Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5910540Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5910654Z   warnings.warn(
2025-12-04T12:12:57.5911205Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5911339Z Traceback (most recent call last):
2025-12-04T12:12:57.5911797Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5911991Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5912213Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5912218Z 
2025-12-04T12:12:57.5912425Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5913356Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5913378Z 
2025-12-04T12:12:57.5913639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5913854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5913975Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5914084Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5914413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5914644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5914738Z graph_break []
2025-12-04T12:12:57.5914959Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5915675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5915775Z   warnings.warn(
2025-12-04T12:12:57.5915995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5916101Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5916249Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5916474Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5916803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5916913Z graph_break []
2025-12-04T12:12:57.5917147Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5917862Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5917971Z   warnings.warn(
2025-12-04T12:12:57.5918110Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5918691Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5918821Z Traceback (most recent call last):
2025-12-04T12:12:57.5919286Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5919492Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5919698Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5919703Z 
2025-12-04T12:12:57.5919912Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5920879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5920917Z 
2025-12-04T12:12:57.5921175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5921398Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5921506Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5921616Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5921957Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5922238Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5922333Z graph_break []
2025-12-04T12:12:57.5922561Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5923275Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5923388Z   warnings.warn(
2025-12-04T12:12:57.5923594Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5923704Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5923829Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5924045Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5924372Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5924485Z graph_break []
2025-12-04T12:12:57.5924696Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5925424Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5925522Z   warnings.warn(
2025-12-04T12:12:57.5925731Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5925860Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5925970Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5926185Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5926528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5926623Z graph_break []
2025-12-04T12:12:57.5926843Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5927598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5927696Z   warnings.warn(
2025-12-04T12:12:57.5928552Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml -
2025-12-04T12:12:57.5928725Z =========================== short test summary info ============================
2025-12-04T12:12:57.5929817Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5929824Z 
2025-12-04T12:12:57.5930036Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5930966Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5930985Z 
2025-12-04T12:12:57.5931247Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5931460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5931684Z ============= 1 failed, 1 skipped, 36 deselected, 2 rerun in 4.92s =============
2025-12-04T12:12:57.5931779Z Got exit code 1
2025-12-04T12:12:57.5931888Z Retrying single test...
2025-12-04T12:12:57.5932528Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml
2025-12-04T12:12:57.5932691Z ============================= test session starts ==============================
2025-12-04T12:12:57.5933044Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5933153Z cachedir: .pytest_cache
2025-12-04T12:12:57.5933660Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5933793Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5933905Z configfile: pytest.ini
2025-12-04T12:12:57.5934481Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5934714Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5935722Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5935846Z Running 1 items in this shard
2025-12-04T12:12:57.5935853Z 
2025-12-04T12:12:57.5936750Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5561s] [100%]
2025-12-04T12:12:57.5937659Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1639s] [100%]
2025-12-04T12:12:57.5938466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%]
2025-12-04T12:12:57.5938471Z 
2025-12-04T12:12:57.5938608Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5939174Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5939343Z Traceback (most recent call last):
2025-12-04T12:12:57.5939817Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5940010Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5940217Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5940256Z 
2025-12-04T12:12:57.5940478Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5941409Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5941414Z 
2025-12-04T12:12:57.5941717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5941931Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5942041Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5942167Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5942499Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5942717Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5942823Z graph_break []
2025-12-04T12:12:57.5943079Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5943811Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5943910Z   warnings.warn(
2025-12-04T12:12:57.5944459Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5944595Z Traceback (most recent call last):
2025-12-04T12:12:57.5945055Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5945265Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5945471Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5945476Z 
2025-12-04T12:12:57.5945686Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5946633Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5946638Z 
2025-12-04T12:12:57.5946899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5947127Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5947239Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5947351Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5947698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5947916Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5948012Z graph_break []
2025-12-04T12:12:57.5948236Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5948958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5949075Z   warnings.warn(
2025-12-04T12:12:57.5949286Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5949394Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5949521Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5949738Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5950065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5950217Z graph_break []
2025-12-04T12:12:57.5950424Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5951151Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5951283Z   warnings.warn(
2025-12-04T12:12:57.5951424Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5951995Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5952117Z Traceback (most recent call last):
2025-12-04T12:12:57.5952621Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5952815Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5953021Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5953028Z 
2025-12-04T12:12:57.5953250Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5954180Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5954214Z 
2025-12-04T12:12:57.5954485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5954694Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5954806Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5954932Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5955262Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5955471Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5955580Z graph_break []
2025-12-04T12:12:57.5955785Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5956510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5956609Z   warnings.warn(
2025-12-04T12:12:57.5956818Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5956936Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5957045Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5957256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5957595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5957690Z graph_break []
2025-12-04T12:12:57.5957910Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5958619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5958717Z   warnings.warn(
2025-12-04T12:12:57.5958936Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5959044Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5959159Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5959385Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5959713Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5959819Z graph_break []
2025-12-04T12:12:57.5960027Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5960735Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5960878Z   warnings.warn(
2025-12-04T12:12:57.5961671Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml -
2025-12-04T12:12:57.5961842Z =========================== short test summary info ============================
2025-12-04T12:12:57.5963025Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5963033Z 
2025-12-04T12:12:57.5963246Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5964214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5964222Z 
2025-12-04T12:12:57.5964482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5964669Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5964864Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.5964960Z Got exit code 1
2025-12-04T12:12:57.5965116Z Retrying single test...
2025-12-04T12:12:57.5965741Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml
2025-12-04T12:12:57.5965899Z ============================= test session starts ==============================
2025-12-04T12:12:57.5966255Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.5966364Z cachedir: .pytest_cache
2025-12-04T12:12:57.5966880Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.5967003Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.5967110Z configfile: pytest.ini
2025-12-04T12:12:57.5967699Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.5967924Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.5968948Z stepcurrent: skipping 37 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5969058Z Running 1 items in this shard
2025-12-04T12:12:57.5969063Z 
2025-12-04T12:12:57.5969950Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5387s] [100%]
2025-12-04T12:12:57.5970850Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1601s] [100%]
2025-12-04T12:12:57.5971658Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1579s] [100%]
2025-12-04T12:12:57.5971665Z 
2025-12-04T12:12:57.5971814Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.5972363Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5972483Z Traceback (most recent call last):
2025-12-04T12:12:57.5972958Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5973153Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5973428Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5973433Z 
2025-12-04T12:12:57.5973640Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5974600Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5974623Z 
2025-12-04T12:12:57.5974884Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5975098Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5975222Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5975333Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5975690Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5975921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5976021Z graph_break []
2025-12-04T12:12:57.5976250Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5976967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5977098Z   warnings.warn(
2025-12-04T12:12:57.5977658Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5977776Z Traceback (most recent call last):
2025-12-04T12:12:57.5978229Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5978436Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5978640Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5978648Z 
2025-12-04T12:12:57.5978866Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5979793Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5979803Z 
2025-12-04T12:12:57.5980060Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5980283Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5980391Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5980514Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5980845Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5981062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5981167Z graph_break []
2025-12-04T12:12:57.5981376Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5982089Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5982199Z   warnings.warn(
2025-12-04T12:12:57.5982407Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5982532Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5982642Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5982852Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5983192Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5983287Z graph_break []
2025-12-04T12:12:57.5983496Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5984220Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5984350Z   warnings.warn(
2025-12-04T12:12:57.5984505Z =================================== FAILURES ===================================
2025-12-04T12:12:57.5985086Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.5985206Z Traceback (most recent call last):
2025-12-04T12:12:57.5985672Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.5985863Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.5986066Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5986083Z 
2025-12-04T12:12:57.5986318Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5987243Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5987251Z 
2025-12-04T12:12:57.5987520Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5987732Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5987883Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5987996Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5988324Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5988547Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5988641Z graph_break []
2025-12-04T12:12:57.5988852Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5989576Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5989675Z   warnings.warn(
2025-12-04T12:12:57.5989898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5990005Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5990114Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5990344Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5990670Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5990765Z graph_break []
2025-12-04T12:12:57.5990988Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5991700Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5991812Z   warnings.warn(
2025-12-04T12:12:57.5992022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.5992135Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.5992259Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.5992472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.5992800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.5992912Z graph_break []
2025-12-04T12:12:57.5993120Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.5993830Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.5993938Z   warnings.warn(
2025-12-04T12:12:57.5994739Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml -
2025-12-04T12:12:57.5994954Z =========================== short test summary info ============================
2025-12-04T12:12:57.5996014Z FAILED [0.1579s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.5996053Z 
2025-12-04T12:12:57.5996278Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.5997203Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5997208Z 
2025-12-04T12:12:57.5997495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.5997684Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.5997880Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.5997991Z Got exit code 1
2025-12-04T12:12:57.5998834Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.5999268Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.5999903Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml
2025-12-04T12:12:57.6000064Z ============================= test session starts ==============================
2025-12-04T12:12:57.6000419Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6000527Z cachedir: .pytest_cache
2025-12-04T12:12:57.6001464Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6001605Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6001711Z configfile: pytest.ini
2025-12-04T12:12:57.6002350Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6002597Z collecting ... collected 380 items / 38 deselected / 342 selected
2025-12-04T12:12:57.6002737Z stepcurrent: skipping 38 already run items.
2025-12-04T12:12:57.6002863Z Running 137 items in this shard
2025-12-04T12:12:57.6002868Z 
2025-12-04T12:12:57.6003880Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6004874Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.6005771Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5710s] [  2%]
2025-12-04T12:12:57.6006654Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1680s] [  2%]
2025-12-04T12:12:57.6007472Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1655s] [  2%]
2025-12-04T12:12:57.6007480Z 
2025-12-04T12:12:57.6007618Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6008173Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6008380Z Traceback (most recent call last):
2025-12-04T12:12:57.6008842Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6009109Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6009323Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6009328Z 
2025-12-04T12:12:57.6009551Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6010510Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6010516Z 
2025-12-04T12:12:57.6010779Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6011012Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6011127Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6011256Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6011588Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6011865Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6011979Z graph_break []
2025-12-04T12:12:57.6012192Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6014862Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6014988Z   return x.grad, w.grad
2025-12-04T12:12:57.6015708Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6015824Z   warnings.warn(
2025-12-04T12:12:57.6018461Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6018583Z   return x.grad, w.grad
2025-12-04T12:12:57.6019123Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6019256Z Traceback (most recent call last):
2025-12-04T12:12:57.6019714Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6019909Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6020132Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6020137Z 
2025-12-04T12:12:57.6020343Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6021263Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6021318Z 
2025-12-04T12:12:57.6021577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6021790Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6021911Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6022053Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6022386Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6022612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6022706Z graph_break []
2025-12-04T12:12:57.6022925Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6025603Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6025748Z   return x.grad, w.grad
2025-12-04T12:12:57.6026466Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6026562Z   warnings.warn(
2025-12-04T12:12:57.6029212Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6029317Z   return x.grad, w.grad
2025-12-04T12:12:57.6029549Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6029655Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6029766Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6029992Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6030319Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6030415Z graph_break []
2025-12-04T12:12:57.6030637Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6033268Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6033389Z   return x.grad, w.grad
2025-12-04T12:12:57.6034102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6034221Z   warnings.warn(
2025-12-04T12:12:57.6036891Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6037042Z   return x.grad, w.grad
2025-12-04T12:12:57.6037183Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6037720Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6037880Z Traceback (most recent call last):
2025-12-04T12:12:57.6038340Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6038544Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6038750Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6038755Z 
2025-12-04T12:12:57.6038964Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6039896Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6039931Z 
2025-12-04T12:12:57.6040193Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6040415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6040522Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6040634Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6040977Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6041190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6041286Z graph_break []
2025-12-04T12:12:57.6041508Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6044227Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6044350Z   return x.grad, w.grad
2025-12-04T12:12:57.6045065Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6045180Z   warnings.warn(
2025-12-04T12:12:57.6047840Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6047960Z   return x.grad, w.grad
2025-12-04T12:12:57.6048177Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6048287Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6048455Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6048672Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6049004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6049110Z graph_break []
2025-12-04T12:12:57.6049348Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6052026Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6052133Z   return x.grad, w.grad
2025-12-04T12:12:57.6052844Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6052954Z   warnings.warn(
2025-12-04T12:12:57.6055581Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6055736Z   return x.grad, w.grad
2025-12-04T12:12:57.6055953Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6056074Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6056186Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6056402Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6056746Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6056843Z graph_break []
2025-12-04T12:12:57.6057050Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6057771Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6057870Z   warnings.warn(
2025-12-04T12:12:57.6060505Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6060612Z   return x.grad, w.grad
2025-12-04T12:12:57.6061427Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml -
2025-12-04T12:12:57.6061593Z =========================== short test summary info ============================
2025-12-04T12:12:57.6062638Z FAILED [0.1655s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6062690Z 
2025-12-04T12:12:57.6062899Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6063914Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6063922Z 
2025-12-04T12:12:57.6064196Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6064370Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6064580Z ============= 1 failed, 2 skipped, 38 deselected, 2 rerun in 4.97s =============
2025-12-04T12:12:57.6064688Z Got exit code 1
2025-12-04T12:12:57.6064836Z Retrying single test...
2025-12-04T12:12:57.6065475Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml
2025-12-04T12:12:57.6065634Z ============================= test session starts ==============================
2025-12-04T12:12:57.6065973Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6066098Z cachedir: .pytest_cache
2025-12-04T12:12:57.6066607Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6066769Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6066877Z configfile: pytest.ini
2025-12-04T12:12:57.6067451Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6067687Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6068687Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6068800Z Running 1 items in this shard
2025-12-04T12:12:57.6068816Z 
2025-12-04T12:12:57.6069699Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5613s] [100%]
2025-12-04T12:12:57.6070580Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1662s] [100%]
2025-12-04T12:12:57.6071393Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1638s] [100%]
2025-12-04T12:12:57.6071399Z 
2025-12-04T12:12:57.6071535Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6072088Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6072205Z Traceback (most recent call last):
2025-12-04T12:12:57.6072665Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6072870Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6073077Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6073081Z 
2025-12-04T12:12:57.6073300Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6074222Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6074227Z 
2025-12-04T12:12:57.6074487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6074762Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6074872Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6074996Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6075356Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6075571Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6075677Z graph_break []
2025-12-04T12:12:57.6075887Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6078577Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6078685Z   return x.grad, w.grad
2025-12-04T12:12:57.6079408Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6079547Z   warnings.warn(
2025-12-04T12:12:57.6082271Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6082394Z   return x.grad, w.grad
2025-12-04T12:12:57.6082932Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6083077Z Traceback (most recent call last):
2025-12-04T12:12:57.6083537Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6083731Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6083949Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6083955Z 
2025-12-04T12:12:57.6084165Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6085093Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6085100Z 
2025-12-04T12:12:57.6085361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6085573Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6085703Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6085816Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6086146Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6086370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6086465Z graph_break []
2025-12-04T12:12:57.6086691Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6089382Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6089531Z   return x.grad, w.grad
2025-12-04T12:12:57.6090245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6090343Z   warnings.warn(
2025-12-04T12:12:57.6093033Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6093171Z   return x.grad, w.grad
2025-12-04T12:12:57.6093406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6093514Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6093624Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6093853Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6094187Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6094302Z graph_break []
2025-12-04T12:12:57.6094514Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6097155Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6097276Z   return x.grad, w.grad
2025-12-04T12:12:57.6097994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6098108Z   warnings.warn(
2025-12-04T12:12:57.6100743Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6101043Z   return x.grad, w.grad
2025-12-04T12:12:57.6101188Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6101730Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6101868Z Traceback (most recent call last):
2025-12-04T12:12:57.6102329Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6102617Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6102826Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6102831Z 
2025-12-04T12:12:57.6103040Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6104020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6104027Z 
2025-12-04T12:12:57.6104290Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6104518Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6104667Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6104781Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6105125Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6105340Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6105436Z graph_break []
2025-12-04T12:12:57.6105663Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6108307Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6108468Z   return x.grad, w.grad
2025-12-04T12:12:57.6109187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6109299Z   warnings.warn(
2025-12-04T12:12:57.6111937Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6112057Z   return x.grad, w.grad
2025-12-04T12:12:57.6112268Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6112378Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6112504Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6112720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6113049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6113160Z graph_break []
2025-12-04T12:12:57.6113373Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6116015Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6116149Z   return x.grad, w.grad
2025-12-04T12:12:57.6116876Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6117019Z   warnings.warn(
2025-12-04T12:12:57.6119679Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6119796Z   return x.grad, w.grad
2025-12-04T12:12:57.6120004Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6120122Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6120235Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6120447Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6120820Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6120916Z graph_break []
2025-12-04T12:12:57.6121124Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6121850Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6121952Z   warnings.warn(
2025-12-04T12:12:57.6124664Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6124775Z   return x.grad, w.grad
2025-12-04T12:12:57.6125589Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml -
2025-12-04T12:12:57.6125759Z =========================== short test summary info ============================
2025-12-04T12:12:57.6126806Z FAILED [0.1638s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6126829Z 
2025-12-04T12:12:57.6127044Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6127969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6127976Z 
2025-12-04T12:12:57.6128246Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6128421Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6128634Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.6128755Z Got exit code 1
2025-12-04T12:12:57.6128887Z Retrying single test...
2025-12-04T12:12:57.6129539Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml
2025-12-04T12:12:57.6129779Z ============================= test session starts ==============================
2025-12-04T12:12:57.6130150Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6130289Z cachedir: .pytest_cache
2025-12-04T12:12:57.6131471Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6131600Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6131705Z configfile: pytest.ini
2025-12-04T12:12:57.6132294Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6132546Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6133561Z stepcurrent: skipping 40 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6133675Z Running 1 items in this shard
2025-12-04T12:12:57.6133680Z 
2025-12-04T12:12:57.6134568Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5443s] [100%]
2025-12-04T12:12:57.6135491Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1640s] [100%]
2025-12-04T12:12:57.6136291Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1612s] [100%]
2025-12-04T12:12:57.6136296Z 
2025-12-04T12:12:57.6136446Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6136988Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6137123Z Traceback (most recent call last):
2025-12-04T12:12:57.6137586Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6137785Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6138007Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6138012Z 
2025-12-04T12:12:57.6138223Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6139153Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6139158Z 
2025-12-04T12:12:57.6139425Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6139641Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6139767Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6139880Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6140219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6140449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6140546Z graph_break []
2025-12-04T12:12:57.6140771Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6143416Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6143569Z   return x.grad, w.grad
2025-12-04T12:12:57.6144321Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6144421Z   warnings.warn(
2025-12-04T12:12:57.6147087Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6147194Z   return x.grad, w.grad
2025-12-04T12:12:57.6147748Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6147896Z Traceback (most recent call last):
2025-12-04T12:12:57.6148358Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6148567Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6148775Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6148780Z 
2025-12-04T12:12:57.6149010Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6149925Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6149932Z 
2025-12-04T12:12:57.6150202Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6150418Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6150529Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6150654Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6150991Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6151204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6151313Z graph_break []
2025-12-04T12:12:57.6151527Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6154174Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6154282Z   return x.grad, w.grad
2025-12-04T12:12:57.6155000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6155108Z   warnings.warn(
2025-12-04T12:12:57.6157771Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6157920Z   return x.grad, w.grad
2025-12-04T12:12:57.6158132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6158254Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6158364Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6158585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6158953Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6159052Z graph_break []
2025-12-04T12:12:57.6159262Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6161922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6162055Z   return x.grad, w.grad
2025-12-04T12:12:57.6162858Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6162961Z   warnings.warn(
2025-12-04T12:12:57.6165608Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6165715Z   return x.grad, w.grad
2025-12-04T12:12:57.6165872Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6166413Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6166532Z Traceback (most recent call last):
2025-12-04T12:12:57.6167007Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6167202Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6167409Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6167426Z 
2025-12-04T12:12:57.6167637Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6168696Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6168702Z 
2025-12-04T12:12:57.6169031Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6169270Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6169380Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6169506Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6169902Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6170132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6170228Z graph_break []
2025-12-04T12:12:57.6170439Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6173172Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6173281Z   return x.grad, w.grad
2025-12-04T12:12:57.6174014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6174112Z   warnings.warn(
2025-12-04T12:12:57.6176775Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6176914Z   return x.grad, w.grad
2025-12-04T12:12:57.6177125Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6177249Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6177360Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6177589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6177925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6178024Z graph_break []
2025-12-04T12:12:57.6178246Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6180901Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6181023Z   return x.grad, w.grad
2025-12-04T12:12:57.6181741Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6181840Z   warnings.warn(
2025-12-04T12:12:57.6184500Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6184637Z   return x.grad, w.grad
2025-12-04T12:12:57.6184863Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6184969Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6185096Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6185345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6185677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6185788Z graph_break []
2025-12-04T12:12:57.6186002Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6186760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6186873Z   warnings.warn(
2025-12-04T12:12:57.6189521Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6189677Z   return x.grad, w.grad
2025-12-04T12:12:57.6190493Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml -
2025-12-04T12:12:57.6190681Z =========================== short test summary info ============================
2025-12-04T12:12:57.6191731Z FAILED [0.1612s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6191739Z 
2025-12-04T12:12:57.6191954Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6192888Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6192896Z 
2025-12-04T12:12:57.6193161Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6193353Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6193551Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.6193648Z Got exit code 1
2025-12-04T12:12:57.6194501Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6194906Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6195553Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml
2025-12-04T12:12:57.6195717Z ============================= test session starts ==============================
2025-12-04T12:12:57.6196060Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6196183Z cachedir: .pytest_cache
2025-12-04T12:12:57.6196699Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6196838Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6196947Z configfile: pytest.ini
2025-12-04T12:12:57.6197557Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6197789Z collecting ... collected 380 items / 41 deselected / 339 selected
2025-12-04T12:12:57.6197931Z stepcurrent: skipping 41 already run items.
2025-12-04T12:12:57.6198075Z Running 134 items in this shard
2025-12-04T12:12:57.6198080Z 
2025-12-04T12:12:57.6199103Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6200122Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.6201287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.6202239Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5402s] [  2%]
2025-12-04T12:12:57.6203211Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1632s] [  2%]
2025-12-04T12:12:57.6204019Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1588s] [  2%]
2025-12-04T12:12:57.6204025Z 
2025-12-04T12:12:57.6204184Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6204730Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6204852Z Traceback (most recent call last):
2025-12-04T12:12:57.6205336Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6205532Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6205741Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6205761Z 
2025-12-04T12:12:57.6205975Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6206900Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6206905Z 
2025-12-04T12:12:57.6207175Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6207392Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6207511Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6207623Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6207957Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6208188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6208283Z graph_break []
2025-12-04T12:12:57.6208491Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6209226Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6209326Z   warnings.warn(
2025-12-04T12:12:57.6209888Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6210058Z Traceback (most recent call last):
2025-12-04T12:12:57.6210517Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6210725Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6210973Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6210978Z 
2025-12-04T12:12:57.6211187Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6212118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6212123Z 
2025-12-04T12:12:57.6212424Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6212652Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6212764Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6212873Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6213220Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6213437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6213598Z graph_break []
2025-12-04T12:12:57.6213809Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6214528Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6214640Z   warnings.warn(
2025-12-04T12:12:57.6214848Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6214956Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6215084Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6215299Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6215640Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6215734Z graph_break []
2025-12-04T12:12:57.6215941Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6216674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6216771Z   warnings.warn(
2025-12-04T12:12:57.6216912Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6217475Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6217593Z Traceback (most recent call last):
2025-12-04T12:12:57.6218064Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6218259Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6218465Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6218470Z 
2025-12-04T12:12:57.6218691Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6219623Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6219629Z 
2025-12-04T12:12:57.6219900Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6220109Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6220221Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6220344Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6220673Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6220918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6221025Z graph_break []
2025-12-04T12:12:57.6221235Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6221993Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6222097Z   warnings.warn(
2025-12-04T12:12:57.6222307Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6222429Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6222543Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6222791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6223135Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6223230Z graph_break []
2025-12-04T12:12:57.6223451Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6224167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6224294Z   warnings.warn(
2025-12-04T12:12:57.6224515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6224623Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6224734Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6224960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6225289Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6225398Z graph_break []
2025-12-04T12:12:57.6225607Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6226324Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6226434Z   warnings.warn(
2025-12-04T12:12:57.6227233Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml -
2025-12-04T12:12:57.6227413Z =========================== short test summary info ============================
2025-12-04T12:12:57.6228463Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6228469Z 
2025-12-04T12:12:57.6228685Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6229614Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6229621Z 
2025-12-04T12:12:57.6229879Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6230072Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6230286Z ============= 1 failed, 3 skipped, 41 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.6230380Z Got exit code 1
2025-12-04T12:12:57.6230494Z Retrying single test...
2025-12-04T12:12:57.6231125Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml
2025-12-04T12:12:57.6231300Z ============================= test session starts ==============================
2025-12-04T12:12:57.6231643Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6231793Z cachedir: .pytest_cache
2025-12-04T12:12:57.6232315Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6232434Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6232538Z configfile: pytest.ini
2025-12-04T12:12:57.6233157Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6233381Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6234431Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6234543Z Running 1 items in this shard
2025-12-04T12:12:57.6234548Z 
2025-12-04T12:12:57.6235436Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5412s] [100%]
2025-12-04T12:12:57.6236335Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1621s] [100%]
2025-12-04T12:12:57.6237166Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1599s] [100%]
2025-12-04T12:12:57.6237172Z 
2025-12-04T12:12:57.6237320Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6237865Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6237996Z Traceback (most recent call last):
2025-12-04T12:12:57.6238457Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6238651Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6238871Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6238876Z 
2025-12-04T12:12:57.6239090Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6240007Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6240024Z 
2025-12-04T12:12:57.6240285Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6240499Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6240623Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6240735Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6241069Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6241296Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6241391Z graph_break []
2025-12-04T12:12:57.6241617Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6242415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6242513Z   warnings.warn(
2025-12-04T12:12:57.6243072Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6243194Z Traceback (most recent call last):
2025-12-04T12:12:57.6243660Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6243912Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6244118Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6244123Z 
2025-12-04T12:12:57.6244347Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6245299Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6245307Z 
2025-12-04T12:12:57.6245582Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6245794Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6245903Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6246062Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6246393Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6246610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6246717Z graph_break []
2025-12-04T12:12:57.6246929Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6247643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6247784Z   warnings.warn(
2025-12-04T12:12:57.6247993Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6248114Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6248226Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6248439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6248780Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6248874Z graph_break []
2025-12-04T12:12:57.6249083Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6249813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6249910Z   warnings.warn(
2025-12-04T12:12:57.6250078Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6250625Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6250743Z Traceback (most recent call last):
2025-12-04T12:12:57.6251218Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6251419Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6251641Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6251649Z 
2025-12-04T12:12:57.6251856Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6252780Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6252787Z 
2025-12-04T12:12:57.6253066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6253280Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6253408Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6253521Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6253853Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6254084Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6254181Z graph_break []
2025-12-04T12:12:57.6254394Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6255162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6255264Z   warnings.warn(
2025-12-04T12:12:57.6255518Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6255630Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6255742Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6255974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6256305Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6256402Z graph_break []
2025-12-04T12:12:57.6256675Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6257391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6257504Z   warnings.warn(
2025-12-04T12:12:57.6257713Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6257823Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6257945Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6258197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6258524Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6258637Z graph_break []
2025-12-04T12:12:57.6258844Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6259569Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6259666Z   warnings.warn(
2025-12-04T12:12:57.6260468Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml -
2025-12-04T12:12:57.6260652Z =========================== short test summary info ============================
2025-12-04T12:12:57.6261712Z FAILED [0.1599s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6261719Z 
2025-12-04T12:12:57.6261945Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6262869Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6262875Z 
2025-12-04T12:12:57.6263134Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6263331Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6263525Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.6263632Z Got exit code 1
2025-12-04T12:12:57.6263736Z Retrying single test...
2025-12-04T12:12:57.6264363Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml
2025-12-04T12:12:57.6264531Z ============================= test session starts ==============================
2025-12-04T12:12:57.6264872Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6264976Z cachedir: .pytest_cache
2025-12-04T12:12:57.6265493Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6265616Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6265845Z configfile: pytest.ini
2025-12-04T12:12:57.6266419Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6266638Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6267686Z stepcurrent: skipping 44 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6267799Z Running 1 items in this shard
2025-12-04T12:12:57.6267804Z 
2025-12-04T12:12:57.6268741Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5561s] [100%]
2025-12-04T12:12:57.6269627Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1623s] [100%]
2025-12-04T12:12:57.6270438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [100%]
2025-12-04T12:12:57.6270492Z 
2025-12-04T12:12:57.6270631Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6271172Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6271305Z Traceback (most recent call last):
2025-12-04T12:12:57.6271768Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6271967Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6272184Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6272191Z 
2025-12-04T12:12:57.6272398Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6273337Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6273344Z 
2025-12-04T12:12:57.6273606Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6273837Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6273947Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6274060Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6274407Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6274625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6274722Z graph_break []
2025-12-04T12:12:57.6274943Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6275658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6275771Z   warnings.warn(
2025-12-04T12:12:57.6276319Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6276438Z Traceback (most recent call last):
2025-12-04T12:12:57.6276906Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6277098Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6277305Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6277310Z 
2025-12-04T12:12:57.6277530Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6278501Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6278506Z 
2025-12-04T12:12:57.6278805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6279021Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6279129Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6279253Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6279584Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6279809Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6279934Z graph_break []
2025-12-04T12:12:57.6280145Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6280878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6280976Z   warnings.warn(
2025-12-04T12:12:57.6281184Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6281339Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6281452Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6281682Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6282010Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6282104Z graph_break []
2025-12-04T12:12:57.6282404Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6283122Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6283222Z   warnings.warn(
2025-12-04T12:12:57.6283377Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6283924Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6284064Z Traceback (most recent call last):
2025-12-04T12:12:57.6284528Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6284722Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6284943Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6284948Z 
2025-12-04T12:12:57.6285157Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6286096Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6286104Z 
2025-12-04T12:12:57.6286364Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6286573Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6286695Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6286809Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6287141Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6287365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6287458Z graph_break []
2025-12-04T12:12:57.6287680Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6288398Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6288541Z   warnings.warn(
2025-12-04T12:12:57.6288762Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6288868Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6288977Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6289209Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6289568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6289674Z graph_break []
2025-12-04T12:12:57.6289884Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6290596Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6290738Z   warnings.warn(
2025-12-04T12:12:57.6290948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6291058Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6291180Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6291392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6291733Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6291826Z graph_break []
2025-12-04T12:12:57.6292069Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6292788Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6292883Z   warnings.warn(
2025-12-04T12:12:57.6293680Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml -
2025-12-04T12:12:57.6293861Z =========================== short test summary info ============================
2025-12-04T12:12:57.6294911Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6294917Z 
2025-12-04T12:12:57.6295140Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6296064Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6296069Z 
2025-12-04T12:12:57.6296339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6296516Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6296709Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.6296819Z Got exit code 1
2025-12-04T12:12:57.6297655Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6298070Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6298697Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml
2025-12-04T12:12:57.6298855Z ============================= test session starts ==============================
2025-12-04T12:12:57.6299207Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6299314Z cachedir: .pytest_cache
2025-12-04T12:12:57.6299823Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6299997Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6300104Z configfile: pytest.ini
2025-12-04T12:12:57.6300698Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6301154Z collecting ... collected 380 items / 45 deselected / 335 selected
2025-12-04T12:12:57.6301301Z stepcurrent: skipping 45 already run items.
2025-12-04T12:12:57.6301426Z Running 130 items in this shard
2025-12-04T12:12:57.6301431Z 
2025-12-04T12:12:57.6302438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6303492Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.6304483Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.6305387Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5642s] [  3%]
2025-12-04T12:12:57.6306311Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1617s] [  3%]
2025-12-04T12:12:57.6307121Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1575s] [  3%]
2025-12-04T12:12:57.6307140Z 
2025-12-04T12:12:57.6307277Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6307820Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6307951Z Traceback (most recent call last):
2025-12-04T12:12:57.6308415Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6308610Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6308828Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6308833Z 
2025-12-04T12:12:57.6309042Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6309981Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6309988Z 
2025-12-04T12:12:57.6310250Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6310463Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6310586Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6310704Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6311050Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6311262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6311358Z graph_break []
2025-12-04T12:12:57.6311579Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6312306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6312404Z   warnings.warn(
2025-12-04T12:12:57.6313008Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6313128Z Traceback (most recent call last):
2025-12-04T12:12:57.6313602Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6313833Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6314041Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6314046Z 
2025-12-04T12:12:57.6314271Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6315225Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6315231Z 
2025-12-04T12:12:57.6315504Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6315720Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6315831Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6315958Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6316293Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6316540Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6316650Z graph_break []
2025-12-04T12:12:57.6316860Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6317594Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6317697Z   warnings.warn(
2025-12-04T12:12:57.6317907Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6318035Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6318148Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6318363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6318703Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6318798Z graph_break []
2025-12-04T12:12:57.6319026Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6319745Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6319845Z   warnings.warn(
2025-12-04T12:12:57.6320003Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6320544Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6320662Z Traceback (most recent call last):
2025-12-04T12:12:57.6321137Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6321329Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6321549Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6321559Z 
2025-12-04T12:12:57.6321765Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6322760Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6322782Z 
2025-12-04T12:12:57.6323042Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6323256Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6323382Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6323537Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6323866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6324098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6324194Z graph_break []
2025-12-04T12:12:57.6324436Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6325167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6325264Z   warnings.warn(
2025-12-04T12:12:57.6325483Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6325590Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6325732Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6325963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6326292Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6326385Z graph_break []
2025-12-04T12:12:57.6326606Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6327315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6327457Z   warnings.warn(
2025-12-04T12:12:57.6327662Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6327769Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6327890Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6328102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6328433Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6328538Z graph_break []
2025-12-04T12:12:57.6328746Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6329468Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6329566Z   warnings.warn(
2025-12-04T12:12:57.6330368Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml -
2025-12-04T12:12:57.6330548Z =========================== short test summary info ============================
2025-12-04T12:12:57.6331611Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6331617Z 
2025-12-04T12:12:57.6331843Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6332763Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6332769Z 
2025-12-04T12:12:57.6333041Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6333219Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6333429Z ============= 1 failed, 3 skipped, 45 deselected, 2 rerun in 4.95s =============
2025-12-04T12:12:57.6333538Z Got exit code 1
2025-12-04T12:12:57.6333641Z Retrying single test...
2025-12-04T12:12:57.6334270Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml
2025-12-04T12:12:57.6334441Z ============================= test session starts ==============================
2025-12-04T12:12:57.6334821Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6334942Z cachedir: .pytest_cache
2025-12-04T12:12:57.6335452Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6335602Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6335722Z configfile: pytest.ini
2025-12-04T12:12:57.6336297Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6336518Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6337561Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6337674Z Running 1 items in this shard
2025-12-04T12:12:57.6337681Z 
2025-12-04T12:12:57.6338582Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5368s] [100%]
2025-12-04T12:12:57.6339470Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [100%]
2025-12-04T12:12:57.6340324Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1588s] [100%]
2025-12-04T12:12:57.6340329Z 
2025-12-04T12:12:57.6340469Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6341013Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6341148Z Traceback (most recent call last):
2025-12-04T12:12:57.6341609Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6341816Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6342026Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6342031Z 
2025-12-04T12:12:57.6342237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6343172Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6343176Z 
2025-12-04T12:12:57.6343440Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6343667Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6343777Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6343889Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6344237Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6344454Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6344554Z graph_break []
2025-12-04T12:12:57.6344782Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6345503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6345617Z   warnings.warn(
2025-12-04T12:12:57.6346174Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6346293Z Traceback (most recent call last):
2025-12-04T12:12:57.6346768Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6347020Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6347238Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6347243Z 
2025-12-04T12:12:57.6347480Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6348413Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6348418Z 
2025-12-04T12:12:57.6348689Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6348927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6349049Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6349161Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6349494Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6349718Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6349815Z graph_break []
2025-12-04T12:12:57.6350028Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6350798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6350897Z   warnings.warn(
2025-12-04T12:12:57.6351117Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6351223Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6351335Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6351561Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6351891Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6351988Z graph_break []
2025-12-04T12:12:57.6352206Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6352920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6353033Z   warnings.warn(
2025-12-04T12:12:57.6353173Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6353717Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6353847Z Traceback (most recent call last):
2025-12-04T12:12:57.6354304Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6354496Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6354714Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6354719Z 
2025-12-04T12:12:57.6354925Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6355858Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6355865Z 
2025-12-04T12:12:57.6356122Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6356333Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6356453Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6356562Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6356903Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6357119Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6357249Z graph_break []
2025-12-04T12:12:57.6357468Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6358187Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6358317Z   warnings.warn(
2025-12-04T12:12:57.6358540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6358647Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6358772Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6358986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6359346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6359456Z graph_break []
2025-12-04T12:12:57.6359666Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6360380Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6360495Z   warnings.warn(
2025-12-04T12:12:57.6360705Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6360860Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6360970Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6361185Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6361531Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6361627Z graph_break []
2025-12-04T12:12:57.6361839Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6362633Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6362735Z   warnings.warn(
2025-12-04T12:12:57.6363540Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml -
2025-12-04T12:12:57.6363711Z =========================== short test summary info ============================
2025-12-04T12:12:57.6364771Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6364792Z 
2025-12-04T12:12:57.6365002Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6365919Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6365927Z 
2025-12-04T12:12:57.6366199Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6366372Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6366564Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.6366678Z Got exit code 1
2025-12-04T12:12:57.6366782Z Retrying single test...
2025-12-04T12:12:57.6367420Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml
2025-12-04T12:12:57.6367576Z ============================= test session starts ==============================
2025-12-04T12:12:57.6367917Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6368037Z cachedir: .pytest_cache
2025-12-04T12:12:57.6368546Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6368735Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6368840Z configfile: pytest.ini
2025-12-04T12:12:57.6369415Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6369683Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6370692Z stepcurrent: skipping 48 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6370806Z Running 1 items in this shard
2025-12-04T12:12:57.6370826Z 
2025-12-04T12:12:57.6371747Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5476s] [100%]
2025-12-04T12:12:57.6372639Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1611s] [100%]
2025-12-04T12:12:57.6373466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1606s] [100%]
2025-12-04T12:12:57.6373522Z 
2025-12-04T12:12:57.6373663Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6374224Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6374346Z Traceback (most recent call last):
2025-12-04T12:12:57.6374810Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6375023Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6375229Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6375234Z 
2025-12-04T12:12:57.6375458Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6376386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6376393Z 
2025-12-04T12:12:57.6376652Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6376881Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6376993Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6377122Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6377455Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6377673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6377785Z graph_break []
2025-12-04T12:12:57.6377996Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6378721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6378838Z   warnings.warn(
2025-12-04T12:12:57.6379386Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6379521Z Traceback (most recent call last):
2025-12-04T12:12:57.6379983Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6380176Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6380392Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6380429Z 
2025-12-04T12:12:57.6380638Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6381599Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6381608Z 
2025-12-04T12:12:57.6381869Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6382080Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6382201Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6382317Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6382676Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6382910Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6383010Z graph_break []
2025-12-04T12:12:57.6383240Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6383959Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6384060Z   warnings.warn(
2025-12-04T12:12:57.6384336Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6384442Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6384556Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6384786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6385111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6385221Z graph_break []
2025-12-04T12:12:57.6385434Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6386147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6386258Z   warnings.warn(
2025-12-04T12:12:57.6386395Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6386944Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6387077Z Traceback (most recent call last):
2025-12-04T12:12:57.6387539Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6387742Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6387946Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6387953Z 
2025-12-04T12:12:57.6388157Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6389092Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6389099Z 
2025-12-04T12:12:57.6389361Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6389588Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6389701Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6389813Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6390151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6390362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6390468Z graph_break []
2025-12-04T12:12:57.6390680Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6391393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6391541Z   warnings.warn(
2025-12-04T12:12:57.6391748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6391856Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6392012Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6392230Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6392571Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6392667Z graph_break []
2025-12-04T12:12:57.6392875Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6393623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6393722Z   warnings.warn(
2025-12-04T12:12:57.6393932Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6394053Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6394164Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6394376Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6394718Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6394846Z graph_break []
2025-12-04T12:12:57.6395067Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6395773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6395869Z   warnings.warn(
2025-12-04T12:12:57.6396678Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml -
2025-12-04T12:12:57.6396847Z =========================== short test summary info ============================
2025-12-04T12:12:57.6397911Z FAILED [0.1606s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6397919Z 
2025-12-04T12:12:57.6398131Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6399050Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6399069Z 
2025-12-04T12:12:57.6399331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6399506Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6399717Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.6399813Z Got exit code 1
2025-12-04T12:12:57.6400651Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6401236Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6401861Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml
2025-12-04T12:12:57.6402040Z ============================= test session starts ==============================
2025-12-04T12:12:57.6402480Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6402591Z cachedir: .pytest_cache
2025-12-04T12:12:57.6403114Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6403317Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6403439Z configfile: pytest.ini
2025-12-04T12:12:57.6404059Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6404284Z collecting ... collected 380 items / 49 deselected / 331 selected
2025-12-04T12:12:57.6404439Z stepcurrent: skipping 49 already run items.
2025-12-04T12:12:57.6404553Z Running 126 items in this shard
2025-12-04T12:12:57.6404558Z 
2025-12-04T12:12:57.6405597Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6406604Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.6407598Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.6408532Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5452s] [  3%]
2025-12-04T12:12:57.6409418Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [  3%]
2025-12-04T12:12:57.6410234Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1589s] [  3%]
2025-12-04T12:12:57.6410242Z 
2025-12-04T12:12:57.6410379Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6410939Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6411059Z Traceback (most recent call last):
2025-12-04T12:12:57.6411524Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6411727Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6411933Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6411939Z 
2025-12-04T12:12:57.6412164Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6413083Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6413090Z 
2025-12-04T12:12:57.6413349Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6413577Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6413685Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6413798Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6414144Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6414360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6414468Z graph_break []
2025-12-04T12:12:57.6414678Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6415394Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6415537Z   warnings.warn(
2025-12-04T12:12:57.6416087Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6416218Z Traceback (most recent call last):
2025-12-04T12:12:57.6416706Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6416901Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6417117Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6417122Z 
2025-12-04T12:12:57.6417329Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6418277Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6418300Z 
2025-12-04T12:12:57.6418558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6418769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6418886Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6418997Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6419357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6419584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6419679Z graph_break []
2025-12-04T12:12:57.6419902Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6420623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6420721Z   warnings.warn(
2025-12-04T12:12:57.6420946Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6421052Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6421165Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6421390Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6421715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6421823Z graph_break []
2025-12-04T12:12:57.6422034Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6422743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6422852Z   warnings.warn(
2025-12-04T12:12:57.6422997Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6423542Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6423676Z Traceback (most recent call last):
2025-12-04T12:12:57.6424140Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6424348Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6424559Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6424564Z 
2025-12-04T12:12:57.6424771Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6425704Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6425711Z 
2025-12-04T12:12:57.6425973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6426193Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6426337Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6426449Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6439126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6439632Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6439746Z graph_break []
2025-12-04T12:12:57.6439992Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6440726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6440827Z   warnings.warn(
2025-12-04T12:12:57.6441132Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6441245Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6441375Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6441601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6441935Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6442048Z graph_break []
2025-12-04T12:12:57.6442363Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6443127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6443247Z   warnings.warn(
2025-12-04T12:12:57.6443458Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6443586Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6443702Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6443917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6444264Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6444365Z graph_break []
2025-12-04T12:12:57.6444577Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6445306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6445408Z   warnings.warn(
2025-12-04T12:12:57.6446219Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml -
2025-12-04T12:12:57.6446388Z =========================== short test summary info ============================
2025-12-04T12:12:57.6447456Z FAILED [0.1589s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6447477Z 
2025-12-04T12:12:57.6447688Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6448610Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6448619Z 
2025-12-04T12:12:57.6448897Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6449074Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6449284Z ============= 1 failed, 3 skipped, 49 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.6449390Z Got exit code 1
2025-12-04T12:12:57.6449495Z Retrying single test...
2025-12-04T12:12:57.6450128Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml
2025-12-04T12:12:57.6450333Z ============================= test session starts ==============================
2025-12-04T12:12:57.6450676Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6450797Z cachedir: .pytest_cache
2025-12-04T12:12:57.6451334Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6451459Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6451580Z configfile: pytest.ini
2025-12-04T12:12:57.6452158Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6452395Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6453425Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6453546Z Running 1 items in this shard
2025-12-04T12:12:57.6453551Z 
2025-12-04T12:12:57.6454462Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [100%]
2025-12-04T12:12:57.6455450Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1629s] [100%]
2025-12-04T12:12:57.6456270Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1593s] [100%]
2025-12-04T12:12:57.6456275Z 
2025-12-04T12:12:57.6456413Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6456972Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6457093Z Traceback (most recent call last):
2025-12-04T12:12:57.6457551Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6457762Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6457972Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6457977Z 
2025-12-04T12:12:57.6458183Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6459118Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6459123Z 
2025-12-04T12:12:57.6459384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6459610Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6459719Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6459832Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6460178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6460395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6460503Z graph_break []
2025-12-04T12:12:57.6460714Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6461432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6461546Z   warnings.warn(
2025-12-04T12:12:57.6462088Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6462238Z Traceback (most recent call last):
2025-12-04T12:12:57.6462709Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6462903Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6463151Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6463159Z 
2025-12-04T12:12:57.6463371Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6464293Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6464311Z 
2025-12-04T12:12:57.6464599Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6464812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6464937Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6465049Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6465380Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6465609Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6465737Z graph_break []
2025-12-04T12:12:57.6465946Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6466676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6466774Z   warnings.warn(
2025-12-04T12:12:57.6467007Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6467168Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6467327Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6467575Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6467960Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6468058Z graph_break []
2025-12-04T12:12:57.6468278Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6468994Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6469108Z   warnings.warn(
2025-12-04T12:12:57.6469246Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6469787Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6469921Z Traceback (most recent call last):
2025-12-04T12:12:57.6470380Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6470591Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6470796Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6470801Z 
2025-12-04T12:12:57.6471010Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6471944Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6471951Z 
2025-12-04T12:12:57.6472209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6472433Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6472544Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6472657Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6473003Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6473290Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6473384Z graph_break []
2025-12-04T12:12:57.6473608Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6474355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6474472Z   warnings.warn(
2025-12-04T12:12:57.6474680Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6474786Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6474913Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6475156Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6475486Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6475594Z graph_break []
2025-12-04T12:12:57.6475802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6476522Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6476621Z   warnings.warn(
2025-12-04T12:12:57.6476864Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6476983Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6477093Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6477304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6477643Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6477740Z graph_break []
2025-12-04T12:12:57.6477961Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6478671Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6478769Z   warnings.warn(
2025-12-04T12:12:57.6479577Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml -
2025-12-04T12:12:57.6479746Z =========================== short test summary info ============================
2025-12-04T12:12:57.6480815Z FAILED [0.1593s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6480822Z 
2025-12-04T12:12:57.6481038Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6481964Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6481971Z 
2025-12-04T12:12:57.6482320Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6482500Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6482715Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.6482816Z Got exit code 1
2025-12-04T12:12:57.6482922Z Retrying single test...
2025-12-04T12:12:57.6483561Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml
2025-12-04T12:12:57.6483722Z ============================= test session starts ==============================
2025-12-04T12:12:57.6484068Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6484236Z cachedir: .pytest_cache
2025-12-04T12:12:57.6484751Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6484887Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6484996Z configfile: pytest.ini
2025-12-04T12:12:57.6485623Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6485867Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6486873Z stepcurrent: skipping 52 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6487031Z Running 1 items in this shard
2025-12-04T12:12:57.6487037Z 
2025-12-04T12:12:57.6487923Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5652s] [100%]
2025-12-04T12:12:57.6488805Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1628s] [100%]
2025-12-04T12:12:57.6489650Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1607s] [100%]
2025-12-04T12:12:57.6489655Z 
2025-12-04T12:12:57.6489792Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6490347Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6490467Z Traceback (most recent call last):
2025-12-04T12:12:57.6490931Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6491138Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6491344Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6491349Z 
2025-12-04T12:12:57.6491568Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6492490Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6492495Z 
2025-12-04T12:12:57.6492771Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6492985Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6493097Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6493221Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6493556Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6493769Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6493877Z graph_break []
2025-12-04T12:12:57.6494083Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6494819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6494920Z   warnings.warn(
2025-12-04T12:12:57.6495468Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6495602Z Traceback (most recent call last):
2025-12-04T12:12:57.6496061Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6496251Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6496513Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6496518Z 
2025-12-04T12:12:57.6496726Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6497689Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6497697Z 
2025-12-04T12:12:57.6497957Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6498167Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6498288Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6498429Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6498775Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6498990Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6499084Z graph_break []
2025-12-04T12:12:57.6499309Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6500026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6500153Z   warnings.warn(
2025-12-04T12:12:57.6500376Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6500485Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6500611Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6501035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6501421Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6501533Z graph_break []
2025-12-04T12:12:57.6501743Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6502458Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6502572Z   warnings.warn(
2025-12-04T12:12:57.6502713Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6503271Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6503393Z Traceback (most recent call last):
2025-12-04T12:12:57.6503852Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6504062Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6504270Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6504276Z 
2025-12-04T12:12:57.6504496Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6505415Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6505420Z 
2025-12-04T12:12:57.6505686Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6505911Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6506019Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6506136Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6506478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6506692Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6506804Z graph_break []
2025-12-04T12:12:57.6507013Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6507812Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6507923Z   warnings.warn(
2025-12-04T12:12:57.6508176Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6508289Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6508414Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6508627Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6508966Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6509062Z graph_break []
2025-12-04T12:12:57.6509320Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6510048Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6510149Z   warnings.warn(
2025-12-04T12:12:57.6510355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6510474Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6510584Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6510854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6511184Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6511280Z graph_break []
2025-12-04T12:12:57.6511501Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6512213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6512310Z   warnings.warn(
2025-12-04T12:12:57.6513115Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml -
2025-12-04T12:12:57.6513284Z =========================== short test summary info ============================
2025-12-04T12:12:57.6514349Z FAILED [0.1607s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6514357Z 
2025-12-04T12:12:57.6514568Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6515508Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6515514Z 
2025-12-04T12:12:57.6515776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6515952Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6516158Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.6516254Z Got exit code 1
2025-12-04T12:12:57.6517106Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6517507Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6518132Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml
2025-12-04T12:12:57.6518305Z ============================= test session starts ==============================
2025-12-04T12:12:57.6518646Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6518799Z cachedir: .pytest_cache
2025-12-04T12:12:57.6519311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6519434Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6519552Z configfile: pytest.ini
2025-12-04T12:12:57.6520159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6520384Z collecting ... collected 380 items / 53 deselected / 327 selected
2025-12-04T12:12:57.6520536Z stepcurrent: skipping 53 already run items.
2025-12-04T12:12:57.6520648Z Running 122 items in this shard
2025-12-04T12:12:57.6520653Z 
2025-12-04T12:12:57.6521700Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6522659Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5633s] [  1%]
2025-12-04T12:12:57.6523545Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1611s] [  1%]
2025-12-04T12:12:57.6524401Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1614s] [  1%]
2025-12-04T12:12:57.6524407Z 
2025-12-04T12:12:57.6524551Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6525111Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6525234Z Traceback (most recent call last):
2025-12-04T12:12:57.6525714Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6525910Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6526123Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6526130Z 
2025-12-04T12:12:57.6526355Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6527278Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6527283Z 
2025-12-04T12:12:57.6527558Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6527772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6527884Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6528011Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6528344Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6528559Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6528669Z graph_break []
2025-12-04T12:12:57.6528881Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6529611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6529711Z   warnings.warn(
2025-12-04T12:12:57.6530267Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6530397Z Traceback (most recent call last):
2025-12-04T12:12:57.6530855Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6531118Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6531333Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6531339Z 
2025-12-04T12:12:57.6531547Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6532520Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6532526Z 
2025-12-04T12:12:57.6532786Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6533027Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6533150Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6533264Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6533610Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6533825Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6533919Z graph_break []
2025-12-04T12:12:57.6534142Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6534894Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6534991Z   warnings.warn(
2025-12-04T12:12:57.6535210Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6535322Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6535444Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6535660Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6535987Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6536095Z graph_break []
2025-12-04T12:12:57.6536304Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6537014Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6537126Z   warnings.warn(
2025-12-04T12:12:57.6537267Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6537821Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6537939Z Traceback (most recent call last):
2025-12-04T12:12:57.6538401Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6538601Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6538807Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6538813Z 
2025-12-04T12:12:57.6539034Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6539956Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6539963Z 
2025-12-04T12:12:57.6540224Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6540444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6540552Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6540676Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6541005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6541223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6541368Z graph_break []
2025-12-04T12:12:57.6541578Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6542296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6542442Z   warnings.warn(
2025-12-04T12:12:57.6542653Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6542775Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6542890Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6543104Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6543446Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6543573Z graph_break []
2025-12-04T12:12:57.6543781Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6544503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6544601Z   warnings.warn(
2025-12-04T12:12:57.6544823Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6544969Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6545080Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6545308Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6545635Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6545730Z graph_break []
2025-12-04T12:12:57.6545949Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6546660Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6546769Z   warnings.warn(
2025-12-04T12:12:57.6547572Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml -
2025-12-04T12:12:57.6547741Z =========================== short test summary info ============================
2025-12-04T12:12:57.6548803Z FAILED [0.1614s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6548808Z 
2025-12-04T12:12:57.6549020Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6549963Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6549971Z 
2025-12-04T12:12:57.6550230Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6550405Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6550625Z ============= 1 failed, 1 skipped, 53 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:57.6550727Z Got exit code 1
2025-12-04T12:12:57.6550844Z Retrying single test...
2025-12-04T12:12:57.6551476Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml
2025-12-04T12:12:57.6551638Z ============================= test session starts ==============================
2025-12-04T12:12:57.6551996Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6552104Z cachedir: .pytest_cache
2025-12-04T12:12:57.6552610Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6552783Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6552888Z configfile: pytest.ini
2025-12-04T12:12:57.6553473Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6553729Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6554732Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6554858Z Running 1 items in this shard
2025-12-04T12:12:57.6554863Z 
2025-12-04T12:12:57.6555790Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5540s] [100%]
2025-12-04T12:12:57.6556698Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%]
2025-12-04T12:12:57.6557508Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%]
2025-12-04T12:12:57.6557549Z 
2025-12-04T12:12:57.6557700Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6558250Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6558369Z Traceback (most recent call last):
2025-12-04T12:12:57.6558843Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6559051Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6559268Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6559273Z 
2025-12-04T12:12:57.6559481Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6560402Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6560409Z 
2025-12-04T12:12:57.6560683Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6560898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6561024Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6561139Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6561470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6561700Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6561800Z graph_break []
2025-12-04T12:12:57.6562010Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6562819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6562921Z   warnings.warn(
2025-12-04T12:12:57.6563477Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6563598Z Traceback (most recent call last):
2025-12-04T12:12:57.6564058Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6564270Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6564476Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6564525Z 
2025-12-04T12:12:57.6564751Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6565676Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6565725Z 
2025-12-04T12:12:57.6565991Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6566216Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6566327Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6566451Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6566785Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6567029Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6567144Z graph_break []
2025-12-04T12:12:57.6567359Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6568080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6568195Z   warnings.warn(
2025-12-04T12:12:57.6568406Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6568575Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6568691Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6568907Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6569244Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6569338Z graph_break []
2025-12-04T12:12:57.6569551Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6570273Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6570377Z   warnings.warn(
2025-12-04T12:12:57.6570529Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6571073Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6571193Z Traceback (most recent call last):
2025-12-04T12:12:57.6571662Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6571855Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6572058Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6572063Z 
2025-12-04T12:12:57.6572283Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6573200Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6573208Z 
2025-12-04T12:12:57.6573478Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6573690Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6573799Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6573923Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6574251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6574473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6574568Z graph_break []
2025-12-04T12:12:57.6574778Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6575504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6575646Z   warnings.warn(
2025-12-04T12:12:57.6575852Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6575973Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6576085Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6576341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6576669Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6576760Z graph_break []
2025-12-04T12:12:57.6576982Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6577726Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6577827Z   warnings.warn(
2025-12-04T12:12:57.6578052Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6578160Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6578286Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6578502Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6578830Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6578970Z graph_break []
2025-12-04T12:12:57.6579177Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6579886Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6579996Z   warnings.warn(
2025-12-04T12:12:57.6580797Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml -
2025-12-04T12:12:57.6580982Z =========================== short test summary info ============================
2025-12-04T12:12:57.6582035Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6582046Z 
2025-12-04T12:12:57.6582257Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6583194Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6583200Z 
2025-12-04T12:12:57.6583461Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6583647Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6583840Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.6583935Z Got exit code 1
2025-12-04T12:12:57.6584049Z Retrying single test...
2025-12-04T12:12:57.6584677Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml
2025-12-04T12:12:57.6584850Z ============================= test session starts ==============================
2025-12-04T12:12:57.6585190Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6585295Z cachedir: .pytest_cache
2025-12-04T12:12:57.6585817Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6585939Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6586045Z configfile: pytest.ini
2025-12-04T12:12:57.6586635Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6586897Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6587944Z stepcurrent: skipping 54 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6588059Z Running 1 items in this shard
2025-12-04T12:12:57.6588064Z 
2025-12-04T12:12:57.6588966Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5363s] [100%]
2025-12-04T12:12:57.6589882Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [100%]
2025-12-04T12:12:57.6590696Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [100%]
2025-12-04T12:12:57.6590701Z 
2025-12-04T12:12:57.6590849Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6591430Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6591569Z Traceback (most recent call last):
2025-12-04T12:12:57.6592032Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6592228Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6592456Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6592460Z 
2025-12-04T12:12:57.6592671Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6593608Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6593613Z 
2025-12-04T12:12:57.6593876Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6594101Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6594225Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6594339Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6594683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6594901Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6595001Z graph_break []
2025-12-04T12:12:57.6595229Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6595947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6596047Z   warnings.warn(
2025-12-04T12:12:57.6596606Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6596731Z Traceback (most recent call last):
2025-12-04T12:12:57.6597203Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6597396Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6597604Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6597609Z 
2025-12-04T12:12:57.6597833Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6598750Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6598797Z 
2025-12-04T12:12:57.6599070Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6599283Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6599422Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6599548Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6599877Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6600089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6600194Z graph_break []
2025-12-04T12:12:57.6600405Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6601427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6601532Z   warnings.warn(
2025-12-04T12:12:57.6601742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6601864Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6601975Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6602256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6602655Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6602748Z graph_break []
2025-12-04T12:12:57.6602970Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6603684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6603780Z   warnings.warn(
2025-12-04T12:12:57.6603934Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6604478Z _ MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6604599Z Traceback (most recent call last):
2025-12-04T12:12:57.6605068Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6605263Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6605480Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6605485Z 
2025-12-04T12:12:57.6605696Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6606614Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6606632Z 
2025-12-04T12:12:57.6606891Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6607100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6607220Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6607334Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6607665Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6607893Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6607988Z graph_break []
2025-12-04T12:12:57.6608196Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6608923Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6609026Z   warnings.warn(
2025-12-04T12:12:57.6609247Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6609413Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6609524Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6609747Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6610074Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6610169Z graph_break []
2025-12-04T12:12:57.6610433Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6611148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6611257Z   warnings.warn(
2025-12-04T12:12:57.6611464Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6611617Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6611742Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6611956Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6612282Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6612391Z graph_break []
2025-12-04T12:12:57.6612600Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6613323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6613452Z   warnings.warn(
2025-12-04T12:12:57.6614259Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml -
2025-12-04T12:12:57.6614438Z =========================== short test summary info ============================
2025-12-04T12:12:57.6615495Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6615503Z 
2025-12-04T12:12:57.6615731Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6616653Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6616661Z 
2025-12-04T12:12:57.6616933Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6617108Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6617304Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.6617418Z Got exit code 1
2025-12-04T12:12:57.6618251Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6618655Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6619292Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml
2025-12-04T12:12:57.6619454Z ============================= test session starts ==============================
2025-12-04T12:12:57.6619806Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6619914Z cachedir: .pytest_cache
2025-12-04T12:12:57.6620420Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6620555Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6620661Z configfile: pytest.ini
2025-12-04T12:12:57.6621250Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6621509Z collecting ... collected 380 items / 55 deselected / 325 selected
2025-12-04T12:12:57.6621648Z stepcurrent: skipping 55 already run items.
2025-12-04T12:12:57.6621774Z Running 120 items in this shard
2025-12-04T12:12:57.6621781Z 
2025-12-04T12:12:57.6622818Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6623860Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.6624740Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5466s] [  2%]
2025-12-04T12:12:57.6625618Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1625s] [  2%]
2025-12-04T12:12:57.6626471Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1607s] [  2%]
2025-12-04T12:12:57.6626477Z 
2025-12-04T12:12:57.6626616Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6627176Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6627298Z Traceback (most recent call last):
2025-12-04T12:12:57.6627761Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6627973Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6628180Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6628185Z 
2025-12-04T12:12:57.6628409Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6629334Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6629339Z 
2025-12-04T12:12:57.6629611Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6629831Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6629941Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6630066Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6630395Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6630610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6630718Z graph_break []
2025-12-04T12:12:57.6630926Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6633592Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6633695Z   return x.grad, w.grad
2025-12-04T12:12:57.6634446Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6634555Z   warnings.warn(
2025-12-04T12:12:57.6637230Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6637459Z   return x.grad, w.grad
2025-12-04T12:12:57.6638001Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6638138Z Traceback (most recent call last):
2025-12-04T12:12:57.6638592Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6638787Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6639036Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6639042Z 
2025-12-04T12:12:57.6639249Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6640185Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6640191Z 
2025-12-04T12:12:57.6640453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6640664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6640788Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6640900Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6641250Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6641461Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6641561Z graph_break []
2025-12-04T12:12:57.6641780Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6644495Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6644616Z   return x.grad, w.grad
2025-12-04T12:12:57.6645337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6645436Z   warnings.warn(
2025-12-04T12:12:57.6648099Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6648244Z   return x.grad, w.grad
2025-12-04T12:12:57.6648471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6648581Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6648692Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6648953Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6649288Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6649394Z graph_break []
2025-12-04T12:12:57.6649603Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6652269Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6652389Z   return x.grad, w.grad
2025-12-04T12:12:57.6653136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6653248Z   warnings.warn(
2025-12-04T12:12:57.6655887Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6656004Z   return x.grad, w.grad
2025-12-04T12:12:57.6656145Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6656685Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6656814Z Traceback (most recent call last):
2025-12-04T12:12:57.6657272Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6657476Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6657682Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6657688Z 
2025-12-04T12:12:57.6657893Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6658832Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6658837Z 
2025-12-04T12:12:57.6659099Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6659322Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6659430Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6659540Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6659884Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6660098Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6660207Z graph_break []
2025-12-04T12:12:57.6660417Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6663180Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6663300Z   return x.grad, w.grad
2025-12-04T12:12:57.6664044Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6664157Z   warnings.warn(
2025-12-04T12:12:57.6666785Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6666930Z   return x.grad, w.grad
2025-12-04T12:12:57.6667141Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6667248Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6667374Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6667593Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6667924Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6668038Z graph_break []
2025-12-04T12:12:57.6668249Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6670888Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6670994Z   return x.grad, w.grad
2025-12-04T12:12:57.6671721Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6671819Z   warnings.warn(
2025-12-04T12:12:57.6674457Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6674573Z   return x.grad, w.grad
2025-12-04T12:12:57.6674781Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6674903Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6675013Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6675231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6675613Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6675708Z graph_break []
2025-12-04T12:12:57.6675927Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6676668Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6676769Z   warnings.warn(
2025-12-04T12:12:57.6679448Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6679555Z   return x.grad, w.grad
2025-12-04T12:12:57.6680370Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml -
2025-12-04T12:12:57.6680568Z =========================== short test summary info ============================
2025-12-04T12:12:57.6681624Z FAILED [0.1607s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6681629Z 
2025-12-04T12:12:57.6681840Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6682822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6682832Z 
2025-12-04T12:12:57.6683107Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6683284Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6683511Z ============= 1 failed, 2 skipped, 55 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.6683605Z Got exit code 1
2025-12-04T12:12:57.6683709Z Retrying single test...
2025-12-04T12:12:57.6684344Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml
2025-12-04T12:12:57.6684503Z ============================= test session starts ==============================
2025-12-04T12:12:57.6684845Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6684965Z cachedir: .pytest_cache
2025-12-04T12:12:57.6685471Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6685601Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6685705Z configfile: pytest.ini
2025-12-04T12:12:57.6686281Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6686515Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6687522Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6687647Z Running 1 items in this shard
2025-12-04T12:12:57.6687652Z 
2025-12-04T12:12:57.6688537Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5642s] [100%]
2025-12-04T12:12:57.6689484Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1645s] [100%]
2025-12-04T12:12:57.6690294Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1644s] [100%]
2025-12-04T12:12:57.6690300Z 
2025-12-04T12:12:57.6690436Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6691016Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6691135Z Traceback (most recent call last):
2025-12-04T12:12:57.6691610Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6691802Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6692009Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6692014Z 
2025-12-04T12:12:57.6692236Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6693187Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6693192Z 
2025-12-04T12:12:57.6693467Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6693681Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6693790Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6693921Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6694259Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6694473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6694582Z graph_break []
2025-12-04T12:12:57.6694794Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6697461Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6697568Z   return x.grad, w.grad
2025-12-04T12:12:57.6698298Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6698396Z   warnings.warn(
2025-12-04T12:12:57.6701292Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6701418Z   return x.grad, w.grad
2025-12-04T12:12:57.6701956Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6702163Z Traceback (most recent call last):
2025-12-04T12:12:57.6702622Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6702856Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6703083Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6703089Z 
2025-12-04T12:12:57.6703298Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6704265Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6704271Z 
2025-12-04T12:12:57.6704533Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6704750Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6704875Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6704989Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6705345Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6705605Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6705703Z graph_break []
2025-12-04T12:12:57.6705929Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6708579Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6708699Z   return x.grad, w.grad
2025-12-04T12:12:57.6709419Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6709520Z   warnings.warn(
2025-12-04T12:12:57.6712169Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6712278Z   return x.grad, w.grad
2025-12-04T12:12:57.6712506Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6712619Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6712749Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6712974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6713308Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6713416Z graph_break []
2025-12-04T12:12:57.6713634Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6716300Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6716487Z   return x.grad, w.grad
2025-12-04T12:12:57.6717209Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6717320Z   warnings.warn(
2025-12-04T12:12:57.6719997Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6720122Z   return x.grad, w.grad
2025-12-04T12:12:57.6720268Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6720851Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6720973Z Traceback (most recent call last):
2025-12-04T12:12:57.6721434Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6721646Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6721853Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6721858Z 
2025-12-04T12:12:57.6722072Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6723066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6723075Z 
2025-12-04T12:12:57.6723338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6723564Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6723673Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6723784Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6724130Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6724345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6724458Z graph_break []
2025-12-04T12:12:57.6724669Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6727306Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6727427Z   return x.grad, w.grad
2025-12-04T12:12:57.6728140Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6728252Z   warnings.warn(
2025-12-04T12:12:57.6730929Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6731079Z   return x.grad, w.grad
2025-12-04T12:12:57.6731287Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6731394Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6731520Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6731765Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6732111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6732207Z graph_break []
2025-12-04T12:12:57.6732416Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6735052Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6735188Z   return x.grad, w.grad
2025-12-04T12:12:57.6735910Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6736011Z   warnings.warn(
2025-12-04T12:12:57.6738654Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6738763Z   return x.grad, w.grad
2025-12-04T12:12:57.6738975Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6739097Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6739209Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6739428Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6739771Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6739869Z graph_break []
2025-12-04T12:12:57.6740091Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6740801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6740901Z   warnings.warn(
2025-12-04T12:12:57.6743552Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6743696Z   return x.grad, w.grad
2025-12-04T12:12:57.6744529Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml -
2025-12-04T12:12:57.6744704Z =========================== short test summary info ============================
2025-12-04T12:12:57.6745792Z FAILED [0.1644s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6745799Z 
2025-12-04T12:12:57.6746013Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6746932Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6746951Z 
2025-12-04T12:12:57.6747208Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6747418Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6747625Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.6747722Z Got exit code 1
2025-12-04T12:12:57.6747826Z Retrying single test...
2025-12-04T12:12:57.6748465Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml
2025-12-04T12:12:57.6748624Z ============================= test session starts ==============================
2025-12-04T12:12:57.6748981Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6749089Z cachedir: .pytest_cache
2025-12-04T12:12:57.6749593Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6749727Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6749837Z configfile: pytest.ini
2025-12-04T12:12:57.6750538Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6750775Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6751779Z stepcurrent: skipping 57 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6751906Z Running 1 items in this shard
2025-12-04T12:12:57.6751912Z 
2025-12-04T12:12:57.6752796Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5480s] [100%]
2025-12-04T12:12:57.6753689Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1656s] [100%]
2025-12-04T12:12:57.6754490Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1648s] [100%]
2025-12-04T12:12:57.6754496Z 
2025-12-04T12:12:57.6754630Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6755184Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6755304Z Traceback (most recent call last):
2025-12-04T12:12:57.6755823Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6756018Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6756224Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6756259Z 
2025-12-04T12:12:57.6756481Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6757396Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6757401Z 
2025-12-04T12:12:57.6757677Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6757919Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6758029Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6758155Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6758488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6758703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6758810Z graph_break []
2025-12-04T12:12:57.6759067Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6761735Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6761841Z   return x.grad, w.grad
2025-12-04T12:12:57.6762638Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6762740Z   warnings.warn(
2025-12-04T12:12:57.6765375Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6765493Z   return x.grad, w.grad
2025-12-04T12:12:57.6766033Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6766167Z Traceback (most recent call last):
2025-12-04T12:12:57.6766627Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6766824Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6767045Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6767050Z 
2025-12-04T12:12:57.6767259Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6768192Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6768197Z 
2025-12-04T12:12:57.6768458Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6768711Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6768832Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6768945Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6769315Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6769530Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6769624Z graph_break []
2025-12-04T12:12:57.6769845Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6772515Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6772635Z   return x.grad, w.grad
2025-12-04T12:12:57.6773350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6773478Z   warnings.warn(
2025-12-04T12:12:57.6776127Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6776234Z   return x.grad, w.grad
2025-12-04T12:12:57.6776457Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6776565Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6776694Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6776910Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6777240Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6777351Z graph_break []
2025-12-04T12:12:57.6777559Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6780209Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6780318Z   return x.grad, w.grad
2025-12-04T12:12:57.6781029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6781140Z   warnings.warn(
2025-12-04T12:12:57.6783765Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6783923Z   return x.grad, w.grad
2025-12-04T12:12:57.6784099Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6784654Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.6784772Z Traceback (most recent call last):
2025-12-04T12:12:57.6785228Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6785465Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6785673Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6785681Z 
2025-12-04T12:12:57.6785890Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6786822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6786862Z 
2025-12-04T12:12:57.6787124Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6787353Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6787461Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6787574Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6787920Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6788132Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6788241Z graph_break []
2025-12-04T12:12:57.6788451Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6791093Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6791211Z   return x.grad, w.grad
2025-12-04T12:12:57.6791930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6792039Z   warnings.warn(
2025-12-04T12:12:57.6794682Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6794799Z   return x.grad, w.grad
2025-12-04T12:12:57.6795009Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6795118Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6795245Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6795465Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6795841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6795935Z graph_break []
2025-12-04T12:12:57.6796146Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6798843Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6798948Z   return x.grad, w.grad
2025-12-04T12:12:57.6799676Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6799778Z   warnings.warn(
2025-12-04T12:12:57.6802657Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6802838Z   return x.grad, w.grad
2025-12-04T12:12:57.6803055Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6803182Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6803300Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6803534Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6803866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6803967Z graph_break []
2025-12-04T12:12:57.6804195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6804915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6805015Z   warnings.warn(
2025-12-04T12:12:57.6807670Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.6807776Z   return x.grad, w.grad
2025-12-04T12:12:57.6808600Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml -
2025-12-04T12:12:57.6808769Z =========================== short test summary info ============================
2025-12-04T12:12:57.6809831Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6809837Z 
2025-12-04T12:12:57.6810049Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6811020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6811040Z 
2025-12-04T12:12:57.6811340Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6811521Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6811731Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.6811830Z Got exit code 1
2025-12-04T12:12:57.6812719Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.6813137Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6813762Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml
2025-12-04T12:12:57.6813937Z ============================= test session starts ==============================
2025-12-04T12:12:57.6814288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6814429Z cachedir: .pytest_cache
2025-12-04T12:12:57.6814952Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6815075Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6815196Z configfile: pytest.ini
2025-12-04T12:12:57.6815777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6815998Z collecting ... collected 380 items / 58 deselected / 322 selected
2025-12-04T12:12:57.6816152Z stepcurrent: skipping 58 already run items.
2025-12-04T12:12:57.6816265Z Running 117 items in this shard
2025-12-04T12:12:57.6816270Z 
2025-12-04T12:12:57.6817163Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5306s] [  0%]
2025-12-04T12:12:57.6818067Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [  0%]
2025-12-04T12:12:57.6818879Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [  0%]
2025-12-04T12:12:57.6818884Z 
2025-12-04T12:12:57.6819033Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6819579Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6819714Z Traceback (most recent call last):
2025-12-04T12:12:57.6820176Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6820376Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6820597Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6820602Z 
2025-12-04T12:12:57.6820807Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6821733Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6821738Z 
2025-12-04T12:12:57.6821998Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6822245Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6822366Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6822477Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6822807Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6823064Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6823161Z graph_break []
2025-12-04T12:12:57.6823383Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6824104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6824238Z   warnings.warn(
2025-12-04T12:12:57.6824800Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6824924Z Traceback (most recent call last):
2025-12-04T12:12:57.6825397Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6825590Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6825798Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6825833Z 
2025-12-04T12:12:57.6826053Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6826972Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6826977Z 
2025-12-04T12:12:57.6827254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6827466Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6827577Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6827701Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6828031Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6828244Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6828352Z graph_break []
2025-12-04T12:12:57.6828567Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6829295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6829396Z   warnings.warn(
2025-12-04T12:12:57.6829605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6829728Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6829839Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6830050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6830392Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6830488Z graph_break []
2025-12-04T12:12:57.6830710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6831427Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6831528Z   warnings.warn(
2025-12-04T12:12:57.6831682Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6832230Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6832353Z Traceback (most recent call last):
2025-12-04T12:12:57.6832823Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6833053Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6833271Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6833276Z 
2025-12-04T12:12:57.6833485Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6834444Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6834451Z 
2025-12-04T12:12:57.6834727Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6834938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6835091Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6835206Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6835538Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6835768Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6835863Z graph_break []
2025-12-04T12:12:57.6836074Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6836805Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6836939Z   warnings.warn(
2025-12-04T12:12:57.6837163Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6837277Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6837389Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6837616Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6837948Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6838043Z graph_break []
2025-12-04T12:12:57.6838263Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6838973Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6839082Z   warnings.warn(
2025-12-04T12:12:57.6839294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6839401Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6839523Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6839735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6840061Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6840169Z graph_break []
2025-12-04T12:12:57.6840378Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6841099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6841198Z   warnings.warn(
2025-12-04T12:12:57.6842002Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml -
2025-12-04T12:12:57.6842252Z =========================== short test summary info ============================
2025-12-04T12:12:57.6843305Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6843311Z 
2025-12-04T12:12:57.6843539Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6844463Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6844508Z 
2025-12-04T12:12:57.6844769Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6844957Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6845250Z ================== 1 failed, 58 deselected, 2 rerun in 4.90s ===================
2025-12-04T12:12:57.6845362Z Got exit code 1
2025-12-04T12:12:57.6845465Z Retrying single test...
2025-12-04T12:12:57.6846092Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml
2025-12-04T12:12:57.6846265Z ============================= test session starts ==============================
2025-12-04T12:12:57.6846636Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6846757Z cachedir: .pytest_cache
2025-12-04T12:12:57.6847264Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6847384Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6847507Z configfile: pytest.ini
2025-12-04T12:12:57.6848083Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6848337Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6849352Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6849465Z Running 1 items in this shard
2025-12-04T12:12:57.6849471Z 
2025-12-04T12:12:57.6850366Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5775s] [100%]
2025-12-04T12:12:57.6851247Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1649s] [100%]
2025-12-04T12:12:57.6852062Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1563s] [100%]
2025-12-04T12:12:57.6852067Z 
2025-12-04T12:12:57.6852204Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6852746Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6852876Z Traceback (most recent call last):
2025-12-04T12:12:57.6853334Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6853544Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6853750Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6853755Z 
2025-12-04T12:12:57.6853964Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6854896Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6854900Z 
2025-12-04T12:12:57.6855158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6855384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6855495Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6855608Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6855984Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6856198Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6856293Z graph_break []
2025-12-04T12:12:57.6856519Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6857286Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6857404Z   warnings.warn(
2025-12-04T12:12:57.6857953Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6858071Z Traceback (most recent call last):
2025-12-04T12:12:57.6858570Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6858764Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6858971Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6858989Z 
2025-12-04T12:12:57.6859199Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6860114Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6860147Z 
2025-12-04T12:12:57.6860419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6860629Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6860737Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6860861Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6861194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6861417Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6861513Z graph_break []
2025-12-04T12:12:57.6861721Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6862452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6862553Z   warnings.warn(
2025-12-04T12:12:57.6862763Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6862881Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6862994Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6863216Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6863545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6863640Z graph_break []
2025-12-04T12:12:57.6863872Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6864584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6864684Z   warnings.warn(
2025-12-04T12:12:57.6864839Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6865391Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6865525Z Traceback (most recent call last):
2025-12-04T12:12:57.6865985Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6866180Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6866407Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6866412Z 
2025-12-04T12:12:57.6866622Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6867603Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6867609Z 
2025-12-04T12:12:57.6867898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6868114Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6868239Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6868352Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6868699Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6868945Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6869043Z graph_break []
2025-12-04T12:12:57.6869264Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6869985Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6870084Z   warnings.warn(
2025-12-04T12:12:57.6870308Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6870452Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6870576Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6870790Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6871120Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6871231Z graph_break []
2025-12-04T12:12:57.6871442Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6872153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6872262Z   warnings.warn(
2025-12-04T12:12:57.6872474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6872596Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6872705Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6872921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6873269Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6873363Z graph_break []
2025-12-04T12:12:57.6873570Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6874295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6874394Z   warnings.warn(
2025-12-04T12:12:57.6875212Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml -
2025-12-04T12:12:57.6875379Z =========================== short test summary info ============================
2025-12-04T12:12:57.6876424Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6876433Z 
2025-12-04T12:12:57.6876654Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6877571Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6877576Z 
2025-12-04T12:12:57.6877845Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6878055Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6878247Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.6878356Z Got exit code 1
2025-12-04T12:12:57.6878458Z Retrying single test...
2025-12-04T12:12:57.6879126Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml
2025-12-04T12:12:57.6879290Z ============================= test session starts ==============================
2025-12-04T12:12:57.6879629Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6879748Z cachedir: .pytest_cache
2025-12-04T12:12:57.6880282Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6880404Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6880522Z configfile: pytest.ini
2025-12-04T12:12:57.6881098Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6881330Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6882401Z stepcurrent: skipping 58 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6882558Z Running 1 items in this shard
2025-12-04T12:12:57.6882563Z 
2025-12-04T12:12:57.6883466Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5511s] [100%]
2025-12-04T12:12:57.6884354Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%]
2025-12-04T12:12:57.6885174Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1580s] [100%]
2025-12-04T12:12:57.6885182Z 
2025-12-04T12:12:57.6885319Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6885876Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6885994Z Traceback (most recent call last):
2025-12-04T12:12:57.6886453Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6886664Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6886874Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6886885Z 
2025-12-04T12:12:57.6887104Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6888025Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6888034Z 
2025-12-04T12:12:57.6888295Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6888519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6888630Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6888743Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6889089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6889303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6889411Z graph_break []
2025-12-04T12:12:57.6889655Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6890372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6890481Z   warnings.warn(
2025-12-04T12:12:57.6891056Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6891191Z Traceback (most recent call last):
2025-12-04T12:12:57.6891651Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6891845Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6892094Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6892099Z 
2025-12-04T12:12:57.6892344Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6893261Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6893278Z 
2025-12-04T12:12:57.6893541Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6893785Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6893907Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6894020Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6894352Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6894580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6894677Z graph_break []
2025-12-04T12:12:57.6894899Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6895615Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6895717Z   warnings.warn(
2025-12-04T12:12:57.6895941Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6896051Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6896169Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6896398Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6896731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6896839Z graph_break []
2025-12-04T12:12:57.6897049Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6897767Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6897880Z   warnings.warn(
2025-12-04T12:12:57.6898018Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6898563Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.6898693Z Traceback (most recent call last):
2025-12-04T12:12:57.6899156Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6899363Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6899567Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6899572Z 
2025-12-04T12:12:57.6899781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6900718Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6900782Z 
2025-12-04T12:12:57.6901204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6901429Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6901538Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6901719Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6902065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6902277Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6902370Z graph_break []
2025-12-04T12:12:57.6902599Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6903352Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6903464Z   warnings.warn(
2025-12-04T12:12:57.6903678Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6903791Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6903916Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6904127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6904458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6904614Z graph_break []
2025-12-04T12:12:57.6904826Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6905548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6905645Z   warnings.warn(
2025-12-04T12:12:57.6905852Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6905971Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6906084Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6906298Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6906637Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6906731Z graph_break []
2025-12-04T12:12:57.6906953Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6907661Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6907757Z   warnings.warn(
2025-12-04T12:12:57.6908561Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml -
2025-12-04T12:12:57.6908731Z =========================== short test summary info ============================
2025-12-04T12:12:57.6909795Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6909802Z 
2025-12-04T12:12:57.6910012Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6910938Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6910943Z 
2025-12-04T12:12:57.6911214Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6911390Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6911599Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.6911697Z Got exit code 1
2025-12-04T12:12:57.6912582Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.6912999Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.6913654Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml
2025-12-04T12:12:57.6913828Z ============================= test session starts ==============================
2025-12-04T12:12:57.6914170Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6914278Z cachedir: .pytest_cache
2025-12-04T12:12:57.6914832Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6914957Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6915067Z configfile: pytest.ini
2025-12-04T12:12:57.6915655Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6915876Z collecting ... collected 380 items / 59 deselected / 321 selected
2025-12-04T12:12:57.6916062Z stepcurrent: skipping 59 already run items.
2025-12-04T12:12:57.6916175Z Running 116 items in this shard
2025-12-04T12:12:57.6916181Z 
2025-12-04T12:12:57.6917184Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.6918082Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5353s] [  1%]
2025-12-04T12:12:57.6918966Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1576s] [  1%]
2025-12-04T12:12:57.6919786Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1566s] [  1%]
2025-12-04T12:12:57.6919794Z 
2025-12-04T12:12:57.6919932Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6920485Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6920605Z Traceback (most recent call last):
2025-12-04T12:12:57.6921067Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6921272Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6921477Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6921482Z 
2025-12-04T12:12:57.6921701Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6922691Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6922701Z 
2025-12-04T12:12:57.6922961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6923187Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6923298Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6923425Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6923759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6923974Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6924134Z graph_break []
2025-12-04T12:12:57.6924344Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6925105Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6925221Z   warnings.warn(
2025-12-04T12:12:57.6925769Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6925906Z Traceback (most recent call last):
2025-12-04T12:12:57.6926368Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6926594Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6926817Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6926825Z 
2025-12-04T12:12:57.6927038Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6927961Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6928015Z 
2025-12-04T12:12:57.6928279Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6928491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6928614Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6928728Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6929060Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6929291Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6929390Z graph_break []
2025-12-04T12:12:57.6929614Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6930339Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6930441Z   warnings.warn(
2025-12-04T12:12:57.6930672Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6930784Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6930896Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6931128Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6931460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6931570Z graph_break []
2025-12-04T12:12:57.6931784Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6932496Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6932615Z   warnings.warn(
2025-12-04T12:12:57.6932760Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6933312Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6933446Z Traceback (most recent call last):
2025-12-04T12:12:57.6933905Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6934110Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6934315Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6934320Z 
2025-12-04T12:12:57.6934533Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6935471Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6935514Z 
2025-12-04T12:12:57.6935772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6936026Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6936136Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6936247Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6936587Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6936801Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6936896Z graph_break []
2025-12-04T12:12:57.6937150Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6937867Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6937981Z   warnings.warn(
2025-12-04T12:12:57.6938187Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6938294Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6938416Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6938680Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6939007Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6939115Z graph_break []
2025-12-04T12:12:57.6939325Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6940046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6940143Z   warnings.warn(
2025-12-04T12:12:57.6940349Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6940468Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6940579Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6940791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6941132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6941230Z graph_break []
2025-12-04T12:12:57.6941453Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6942159Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6942256Z   warnings.warn(
2025-12-04T12:12:57.6943072Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml -
2025-12-04T12:12:57.6943243Z =========================== short test summary info ============================
2025-12-04T12:12:57.6944309Z FAILED [0.1566s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6944319Z 
2025-12-04T12:12:57.6944530Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6945453Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6945471Z 
2025-12-04T12:12:57.6945737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6945913Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6946136Z ============= 1 failed, 1 skipped, 59 deselected, 2 rerun in 4.91s =============
2025-12-04T12:12:57.6946267Z Got exit code 1
2025-12-04T12:12:57.6946371Z Retrying single test...
2025-12-04T12:12:57.6947011Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml
2025-12-04T12:12:57.6947199Z ============================= test session starts ==============================
2025-12-04T12:12:57.6947552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6947659Z cachedir: .pytest_cache
2025-12-04T12:12:57.6948169Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6948304Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6948439Z configfile: pytest.ini
2025-12-04T12:12:57.6949013Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6949248Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6950246Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6950403Z Running 1 items in this shard
2025-12-04T12:12:57.6950408Z 
2025-12-04T12:12:57.6951290Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5303s] [100%]
2025-12-04T12:12:57.6952183Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [100%]
2025-12-04T12:12:57.6952981Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1542s] [100%]
2025-12-04T12:12:57.6952989Z 
2025-12-04T12:12:57.6953124Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6953679Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6953799Z Traceback (most recent call last):
2025-12-04T12:12:57.6954272Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6954465Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6954676Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6954681Z 
2025-12-04T12:12:57.6954902Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6955826Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6955831Z 
2025-12-04T12:12:57.6956105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6956322Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6956434Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6956557Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6956889Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6957102Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6957214Z graph_break []
2025-12-04T12:12:57.6957425Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6958154Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6958289Z   warnings.warn(
2025-12-04T12:12:57.6958834Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6958998Z Traceback (most recent call last):
2025-12-04T12:12:57.6959457Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6959651Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6959870Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6959875Z 
2025-12-04T12:12:57.6960084Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6961048Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6961055Z 
2025-12-04T12:12:57.6961318Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6961540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6961651Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6961790Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6962203Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6962420Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6962517Z graph_break []
2025-12-04T12:12:57.6962743Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6963461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6963576Z   warnings.warn(
2025-12-04T12:12:57.6963784Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6963894Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6964022Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6964232Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6964563Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6964672Z graph_break []
2025-12-04T12:12:57.6964879Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6965589Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6965699Z   warnings.warn(
2025-12-04T12:12:57.6965839Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6966405Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6966521Z Traceback (most recent call last):
2025-12-04T12:12:57.6966979Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6967186Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6967395Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6967400Z 
2025-12-04T12:12:57.6967618Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6968536Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6968541Z 
2025-12-04T12:12:57.6968799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6969065Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6969173Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6969297Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6969627Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6969872Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6969982Z graph_break []
2025-12-04T12:12:57.6970190Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6970908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6971046Z   warnings.warn(
2025-12-04T12:12:57.6971256Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6971375Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6971488Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6971703Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6972043Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6972138Z graph_break []
2025-12-04T12:12:57.6972382Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6973102Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6973201Z   warnings.warn(
2025-12-04T12:12:57.6973422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6973533Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6973644Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6973871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6974199Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6974294Z graph_break []
2025-12-04T12:12:57.6974514Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6975223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6975335Z   warnings.warn(
2025-12-04T12:12:57.6976134Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml -
2025-12-04T12:12:57.6976301Z =========================== short test summary info ============================
2025-12-04T12:12:57.6977367Z FAILED [0.1542s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6977375Z 
2025-12-04T12:12:57.6977584Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6978518Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6978525Z 
2025-12-04T12:12:57.6978784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6978958Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.6979164Z ================== 1 failed, 174 deselected, 2 rerun in 4.89s ==================
2025-12-04T12:12:57.6979261Z Got exit code 1
2025-12-04T12:12:57.6979378Z Retrying single test...
2025-12-04T12:12:57.6980004Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml
2025-12-04T12:12:57.6980196Z ============================= test session starts ==============================
2025-12-04T12:12:57.6980552Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.6980660Z cachedir: .pytest_cache
2025-12-04T12:12:57.6981196Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.6981339Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.6981446Z configfile: pytest.ini
2025-12-04T12:12:57.6982039Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.6982306Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.6983306Z stepcurrent: skipping 60 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6983436Z Running 1 items in this shard
2025-12-04T12:12:57.6983440Z 
2025-12-04T12:12:57.6984333Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5375s] [100%]
2025-12-04T12:12:57.6985266Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1595s] [100%]
2025-12-04T12:12:57.6986072Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1563s] [100%]
2025-12-04T12:12:57.6986078Z 
2025-12-04T12:12:57.6986232Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.6986777Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6986897Z Traceback (most recent call last):
2025-12-04T12:12:57.6987377Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6987573Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6987797Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6987802Z 
2025-12-04T12:12:57.6988011Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6988936Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6988941Z 
2025-12-04T12:12:57.6989220Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6989437Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6989564Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6989678Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6990012Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6990245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6990343Z graph_break []
2025-12-04T12:12:57.6990553Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6991295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6991395Z   warnings.warn(
2025-12-04T12:12:57.6991955Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6992107Z Traceback (most recent call last):
2025-12-04T12:12:57.6992561Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.6992797Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.6993005Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.6993010Z 
2025-12-04T12:12:57.6993222Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.6994160Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.6994202Z 
2025-12-04T12:12:57.6994465Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.6994695Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6994809Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6994922Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6995269Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6995487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6995630Z graph_break []
2025-12-04T12:12:57.6995841Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6996558Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6996670Z   warnings.warn(
2025-12-04T12:12:57.6996880Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.6996986Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.6997112Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.6997324Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.6997665Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.6997757Z graph_break []
2025-12-04T12:12:57.6997965Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.6998693Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.6998789Z   warnings.warn(
2025-12-04T12:12:57.6998929Z =================================== FAILURES ===================================
2025-12-04T12:12:57.6999483Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.6999599Z Traceback (most recent call last):
2025-12-04T12:12:57.7000069Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7000264Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7000465Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7000470Z 
2025-12-04T12:12:57.7000691Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7001798Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7001803Z 
2025-12-04T12:12:57.7002079Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7002388Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7002499Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7002624Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7003038Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7003266Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7003361Z graph_break []
2025-12-04T12:12:57.7003571Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7004338Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7004438Z   warnings.warn(
2025-12-04T12:12:57.7004646Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7004770Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7004880Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7005134Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7005478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7005575Z graph_break []
2025-12-04T12:12:57.7005796Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7006510Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7006654Z   warnings.warn(
2025-12-04T12:12:57.7006874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7006980Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7007090Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7007316Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7007645Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7007753Z graph_break []
2025-12-04T12:12:57.7007963Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7008677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7008787Z   warnings.warn(
2025-12-04T12:12:57.7009590Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml -
2025-12-04T12:12:57.7009770Z =========================== short test summary info ============================
2025-12-04T12:12:57.7010817Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7010825Z 
2025-12-04T12:12:57.7011036Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7011970Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7011977Z 
2025-12-04T12:12:57.7012238Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7012430Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7012624Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.7012720Z Got exit code 1
2025-12-04T12:12:57.7013571Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7013972Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7014608Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml
2025-12-04T12:12:57.7014806Z ============================= test session starts ==============================
2025-12-04T12:12:57.7015147Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7015296Z cachedir: .pytest_cache
2025-12-04T12:12:57.7015806Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7015939Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7016045Z configfile: pytest.ini
2025-12-04T12:12:57.7016620Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7016884Z collecting ... collected 380 items / 61 deselected / 319 selected
2025-12-04T12:12:57.7017024Z stepcurrent: skipping 61 already run items.
2025-12-04T12:12:57.7017137Z Running 114 items in this shard
2025-12-04T12:12:57.7017143Z 
2025-12-04T12:12:57.7018047Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5433s] [  0%]
2025-12-04T12:12:57.7018961Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1590s] [  0%]
2025-12-04T12:12:57.7019777Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1561s] [  0%]
2025-12-04T12:12:57.7019783Z 
2025-12-04T12:12:57.7019923Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7020477Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7020600Z Traceback (most recent call last):
2025-12-04T12:12:57.7021060Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7021264Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7021475Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7021479Z 
2025-12-04T12:12:57.7021685Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7022614Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7022619Z 
2025-12-04T12:12:57.7022880Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7023104Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7023217Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7023332Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7023675Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7023890Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7024000Z graph_break []
2025-12-04T12:12:57.7024210Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7024930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7025040Z   warnings.warn(
2025-12-04T12:12:57.7025584Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7025702Z Traceback (most recent call last):
2025-12-04T12:12:57.7026299Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7026490Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7026707Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7026713Z 
2025-12-04T12:12:57.7026972Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7027894Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7027912Z 
2025-12-04T12:12:57.7028171Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7028415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7028537Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7028649Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7028978Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7029203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7029297Z graph_break []
2025-12-04T12:12:57.7029508Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7030267Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7030365Z   warnings.warn(
2025-12-04T12:12:57.7030585Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7030692Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7030808Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7031035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7031365Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7031463Z graph_break []
2025-12-04T12:12:57.7031682Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7032399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7032513Z   warnings.warn(
2025-12-04T12:12:57.7032653Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7033204Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7033335Z Traceback (most recent call last):
2025-12-04T12:12:57.7033795Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7034003Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7034213Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7034217Z 
2025-12-04T12:12:57.7034423Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7035357Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7035365Z 
2025-12-04T12:12:57.7035624Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7035851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7035959Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7036072Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7036413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7036626Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7036760Z graph_break []
2025-12-04T12:12:57.7048253Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7049194Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7049313Z   warnings.warn(
2025-12-04T12:12:57.7049558Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7049680Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7049815Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7050041Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7050426Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7050546Z graph_break []
2025-12-04T12:12:57.7050773Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7051517Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7051637Z   warnings.warn(
2025-12-04T12:12:57.7051860Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7052027Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7052146Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7052370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7052722Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7052824Z graph_break []
2025-12-04T12:12:57.7053042Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7053799Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7053906Z   warnings.warn(
2025-12-04T12:12:57.7054724Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml -
2025-12-04T12:12:57.7054900Z =========================== short test summary info ============================
2025-12-04T12:12:57.7055958Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7055981Z 
2025-12-04T12:12:57.7056196Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7057128Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7057136Z 
2025-12-04T12:12:57.7057409Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7057588Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7057791Z ================== 1 failed, 61 deselected, 2 rerun in 4.91s ===================
2025-12-04T12:12:57.7057911Z Got exit code 1
2025-12-04T12:12:57.7058021Z Retrying single test...
2025-12-04T12:12:57.7058671Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml
2025-12-04T12:12:57.7058839Z ============================= test session starts ==============================
2025-12-04T12:12:57.7059189Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7059313Z cachedir: .pytest_cache
2025-12-04T12:12:57.7059826Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7060006Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7060129Z configfile: pytest.ini
2025-12-04T12:12:57.7060743Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7060992Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7062008Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7062127Z Running 1 items in this shard
2025-12-04T12:12:57.7062132Z 
2025-12-04T12:12:57.7063071Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5406s] [100%]
2025-12-04T12:12:57.7063965Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%]
2025-12-04T12:12:57.7064793Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1560s] [100%]
2025-12-04T12:12:57.7064833Z 
2025-12-04T12:12:57.7064978Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7065542Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7065672Z Traceback (most recent call last):
2025-12-04T12:12:57.7066141Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7066360Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7066574Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7066580Z 
2025-12-04T12:12:57.7066794Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7067731Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7067739Z 
2025-12-04T12:12:57.7068036Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7068278Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7068408Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7068622Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7073350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7073600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7073712Z graph_break []
2025-12-04T12:12:57.7073928Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7074645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7074749Z   warnings.warn(
2025-12-04T12:12:57.7075287Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7075402Z Traceback (most recent call last):
2025-12-04T12:12:57.7075864Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7076052Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7076298Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7076305Z 
2025-12-04T12:12:57.7076509Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7077457Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7077473Z 
2025-12-04T12:12:57.7077737Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7077948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7078068Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7078175Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7078535Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7078757Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7078851Z graph_break []
2025-12-04T12:12:57.7079058Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7079779Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7080495Z   warnings.warn(
2025-12-04T12:12:57.7080710Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7080813Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7080919Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7081138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7081460Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7081555Z graph_break []
2025-12-04T12:12:57.7081767Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7082570Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7082677Z   warnings.warn(
2025-12-04T12:12:57.7082814Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7083361Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7083482Z Traceback (most recent call last):
2025-12-04T12:12:57.7083935Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7084124Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7084333Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7084339Z 
2025-12-04T12:12:57.7084540Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7106110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7106127Z 
2025-12-04T12:12:57.7106415Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7106634Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7106748Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7106854Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7107188Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7107397Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7107486Z graph_break []
2025-12-04T12:12:57.7107705Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7108421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7108625Z   warnings.warn(
2025-12-04T12:12:57.7108836Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7108939Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7109108Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7109320Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7109643Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7109744Z graph_break []
2025-12-04T12:12:57.7109948Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7110705Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7110812Z   warnings.warn(
2025-12-04T12:12:57.7111016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7111130Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7111244Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7111457Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7111872Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7111967Z graph_break []
2025-12-04T12:12:57.7112174Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7112897Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7112996Z   warnings.warn(
2025-12-04T12:12:57.7113816Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml -
2025-12-04T12:12:57.7113987Z =========================== short test summary info ============================
2025-12-04T12:12:57.7115056Z FAILED [0.1560s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7115078Z 
2025-12-04T12:12:57.7115294Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7116220Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7116225Z 
2025-12-04T12:12:57.7116499Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7116675Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7116886Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.7116983Z Got exit code 1
2025-12-04T12:12:57.7117089Z Retrying single test...
2025-12-04T12:12:57.7117731Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml
2025-12-04T12:12:57.7117891Z ============================= test session starts ==============================
2025-12-04T12:12:57.7118236Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7118355Z cachedir: .pytest_cache
2025-12-04T12:12:57.7118868Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7119004Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7119108Z configfile: pytest.ini
2025-12-04T12:12:57.7119683Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7120010Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7121127Z stepcurrent: skipping 61 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7121246Z Running 1 items in this shard
2025-12-04T12:12:57.7121265Z 
2025-12-04T12:12:57.7122219Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [100%]
2025-12-04T12:12:57.7123143Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1589s] [100%]
2025-12-04T12:12:57.7123970Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%]
2025-12-04T12:12:57.7123976Z 
2025-12-04T12:12:57.7124115Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7124711Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7124830Z Traceback (most recent call last):
2025-12-04T12:12:57.7125293Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7125499Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7125705Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7125710Z 
2025-12-04T12:12:57.7125932Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7126854Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7126859Z 
2025-12-04T12:12:57.7127121Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7127351Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7127460Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7127591Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7127923Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7128138Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7128246Z graph_break []
2025-12-04T12:12:57.7128460Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7129182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7129292Z   warnings.warn(
2025-12-04T12:12:57.7129840Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7129973Z Traceback (most recent call last):
2025-12-04T12:12:57.7130432Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7130625Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7130842Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7130848Z 
2025-12-04T12:12:57.7131056Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7131984Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7132022Z 
2025-12-04T12:12:57.7132281Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7132521Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7132646Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7132757Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7133089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7133315Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7133411Z graph_break []
2025-12-04T12:12:57.7133662Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7134383Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7134486Z   warnings.warn(
2025-12-04T12:12:57.7134708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7134814Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7134926Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7135188Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7135519Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7135627Z graph_break []
2025-12-04T12:12:57.7135835Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7136551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7136664Z   warnings.warn(
2025-12-04T12:12:57.7136804Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7137365Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7137485Z Traceback (most recent call last):
2025-12-04T12:12:57.7137944Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7138152Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7138356Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7138361Z 
2025-12-04T12:12:57.7138571Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7139513Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7139518Z 
2025-12-04T12:12:57.7139780Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7140003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7140112Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7140224Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7140566Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7140781Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7140889Z graph_break []
2025-12-04T12:12:57.7141098Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7141813Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7141925Z   warnings.warn(
2025-12-04T12:12:57.7142134Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7142273Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7142396Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7142612Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7142983Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7143081Z graph_break []
2025-12-04T12:12:57.7143288Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7144012Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7144109Z   warnings.warn(
2025-12-04T12:12:57.7144346Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7144468Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7144578Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7144806Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7145134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7145230Z graph_break []
2025-12-04T12:12:57.7145451Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7146193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7146290Z   warnings.warn(
2025-12-04T12:12:57.7147102Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml -
2025-12-04T12:12:57.7147271Z =========================== short test summary info ============================
2025-12-04T12:12:57.7148333Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7148341Z 
2025-12-04T12:12:57.7148553Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7149475Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7149495Z 
2025-12-04T12:12:57.7149756Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7149932Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7150141Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.7150238Z Got exit code 1
2025-12-04T12:12:57.7151069Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7151485Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7152109Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml
2025-12-04T12:12:57.7152287Z ============================= test session starts ==============================
2025-12-04T12:12:57.7152627Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7152734Z cachedir: .pytest_cache
2025-12-04T12:12:57.7153257Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7153381Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7153504Z configfile: pytest.ini
2025-12-04T12:12:57.7154110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7154330Z collecting ... collected 380 items / 62 deselected / 318 selected
2025-12-04T12:12:57.7154484Z stepcurrent: skipping 62 already run items.
2025-12-04T12:12:57.7154628Z Running 113 items in this shard
2025-12-04T12:12:57.7154633Z 
2025-12-04T12:12:57.7155648Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.7156583Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5302s] [  1%]
2025-12-04T12:12:57.7157465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1578s] [  1%]
2025-12-04T12:12:57.7158287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1549s] [  1%]
2025-12-04T12:12:57.7158323Z 
2025-12-04T12:12:57.7158464Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7159024Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7159144Z Traceback (most recent call last):
2025-12-04T12:12:57.7159607Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7159812Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7160019Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7160026Z 
2025-12-04T12:12:57.7160250Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7161176Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7161183Z 
2025-12-04T12:12:57.7161449Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7161676Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7161788Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7161918Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7162377Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7162591Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7162705Z graph_break []
2025-12-04T12:12:57.7162921Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7163645Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7163765Z   warnings.warn(
2025-12-04T12:12:57.7164308Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7164439Z Traceback (most recent call last):
2025-12-04T12:12:57.7164899Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7165098Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7165323Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7165328Z 
2025-12-04T12:12:57.7165539Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7166528Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7166534Z 
2025-12-04T12:12:57.7166825Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7167041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7167162Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7167276Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7167631Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7167881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7167983Z graph_break []
2025-12-04T12:12:57.7168209Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7168928Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7169028Z   warnings.warn(
2025-12-04T12:12:57.7169250Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7169396Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7169526Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7169741Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7170073Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7170185Z graph_break []
2025-12-04T12:12:57.7170393Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7171104Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7171216Z   warnings.warn(
2025-12-04T12:12:57.7171356Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7171913Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7172035Z Traceback (most recent call last):
2025-12-04T12:12:57.7172498Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7172701Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7172908Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7172913Z 
2025-12-04T12:12:57.7173119Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7174059Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7174066Z 
2025-12-04T12:12:57.7174324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7174549Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7174660Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7174773Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7175116Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7175329Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7175438Z graph_break []
2025-12-04T12:12:57.7175647Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7176367Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7176520Z   warnings.warn(
2025-12-04T12:12:57.7176729Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7176839Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7176965Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7177177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7177552Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7177649Z graph_break []
2025-12-04T12:12:57.7177858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7178585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7178716Z   warnings.warn(
2025-12-04T12:12:57.7178927Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7179050Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7179162Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7179392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7179722Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7179878Z graph_break []
2025-12-04T12:12:57.7180101Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7180814Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7180912Z   warnings.warn(
2025-12-04T12:12:57.7181725Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml -
2025-12-04T12:12:57.7181892Z =========================== short test summary info ============================
2025-12-04T12:12:57.7182966Z FAILED [0.1549s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7182973Z 
2025-12-04T12:12:57.7183190Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7184132Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7184137Z 
2025-12-04T12:12:57.7184397Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7184578Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7184803Z ============= 1 failed, 1 skipped, 62 deselected, 2 rerun in 4.90s =============
2025-12-04T12:12:57.7184902Z Got exit code 1
2025-12-04T12:12:57.7185009Z Retrying single test...
2025-12-04T12:12:57.7185648Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml
2025-12-04T12:12:57.7185809Z ============================= test session starts ==============================
2025-12-04T12:12:57.7186172Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7186278Z cachedir: .pytest_cache
2025-12-04T12:12:57.7186788Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7186924Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7187029Z configfile: pytest.ini
2025-12-04T12:12:57.7187607Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7187886Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7188887Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7189049Z Running 1 items in this shard
2025-12-04T12:12:57.7189054Z 
2025-12-04T12:12:57.7189948Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5279s] [100%]
2025-12-04T12:12:57.7190888Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1583s] [100%]
2025-12-04T12:12:57.7191691Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1561s] [100%]
2025-12-04T12:12:57.7191698Z 
2025-12-04T12:12:57.7191835Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7192396Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7192545Z Traceback (most recent call last):
2025-12-04T12:12:57.7193020Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7193213Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7193418Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7193423Z 
2025-12-04T12:12:57.7193645Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7194571Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7194578Z 
2025-12-04T12:12:57.7194850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7195064Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7195177Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7195306Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7195644Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7195873Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7195972Z graph_break []
2025-12-04T12:12:57.7196187Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7196919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7197020Z   warnings.warn(
2025-12-04T12:12:57.7197580Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7197711Z Traceback (most recent call last):
2025-12-04T12:12:57.7198177Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7198385Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7198601Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7198606Z 
2025-12-04T12:12:57.7198815Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7199756Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7199805Z 
2025-12-04T12:12:57.7200066Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7200292Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7200405Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7200519Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7201174Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7201401Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7201498Z graph_break []
2025-12-04T12:12:57.7201720Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7202556Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7202673Z   warnings.warn(
2025-12-04T12:12:57.7202889Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7202999Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7203124Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7203341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7203674Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7203830Z graph_break []
2025-12-04T12:12:57.7204041Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7204769Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7204868Z   warnings.warn(
2025-12-04T12:12:57.7205012Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7205576Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7205698Z Traceback (most recent call last):
2025-12-04T12:12:57.7206162Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7206374Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7206585Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7206590Z 
2025-12-04T12:12:57.7206817Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7207742Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7207750Z 
2025-12-04T12:12:57.7208067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7208279Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7208389Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7208516Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7208847Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7209062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7209172Z graph_break []
2025-12-04T12:12:57.7209381Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7210091Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7210203Z   warnings.warn(
2025-12-04T12:12:57.7210417Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7210545Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7210657Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7210924Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7211272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7211369Z graph_break []
2025-12-04T12:12:57.7211623Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7212335Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7212432Z   warnings.warn(
2025-12-04T12:12:57.7212658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7212767Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7212911Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7213140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7213470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7213582Z graph_break []
2025-12-04T12:12:57.7213795Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7214508Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7214655Z   warnings.warn(
2025-12-04T12:12:57.7215459Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml -
2025-12-04T12:12:57.7215643Z =========================== short test summary info ============================
2025-12-04T12:12:57.7216695Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7216703Z 
2025-12-04T12:12:57.7216915Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7217855Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7217863Z 
2025-12-04T12:12:57.7218125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7218313Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7218509Z ================== 1 failed, 174 deselected, 2 rerun in 4.89s ==================
2025-12-04T12:12:57.7218607Z Got exit code 1
2025-12-04T12:12:57.7218730Z Retrying single test...
2025-12-04T12:12:57.7219360Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml
2025-12-04T12:12:57.7219539Z ============================= test session starts ==============================
2025-12-04T12:12:57.7219882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7219992Z cachedir: .pytest_cache
2025-12-04T12:12:57.7220517Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7220641Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7220748Z configfile: pytest.ini
2025-12-04T12:12:57.7221346Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7221568Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7222588Z stepcurrent: skipping 63 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7222744Z Running 1 items in this shard
2025-12-04T12:12:57.7222749Z 
2025-12-04T12:12:57.7223673Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5425s] [100%]
2025-12-04T12:12:57.7224572Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%]
2025-12-04T12:12:57.7225414Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [100%]
2025-12-04T12:12:57.7225420Z 
2025-12-04T12:12:57.7225575Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7226121Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7226253Z Traceback (most recent call last):
2025-12-04T12:12:57.7226715Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7226939Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7227160Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7227165Z 
2025-12-04T12:12:57.7227375Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7228311Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7228316Z 
2025-12-04T12:12:57.7228577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7228792Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7228913Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7229028Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7229357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7229587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7229682Z graph_break []
2025-12-04T12:12:57.7229906Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7230628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7230730Z   warnings.warn(
2025-12-04T12:12:57.7231288Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7231408Z Traceback (most recent call last):
2025-12-04T12:12:57.7231870Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7232078Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7232288Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7232295Z 
2025-12-04T12:12:57.7232519Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7233437Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7233443Z 
2025-12-04T12:12:57.7233717Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7233934Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7234076Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7234200Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7234529Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7234744Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7234854Z graph_break []
2025-12-04T12:12:57.7235093Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7235817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7235928Z   warnings.warn(
2025-12-04T12:12:57.7236138Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7236372Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7236486Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7236701Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7237042Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7237138Z graph_break []
2025-12-04T12:12:57.7237348Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7238072Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7238205Z   warnings.warn(
2025-12-04T12:12:57.7238362Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7238906Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7239026Z Traceback (most recent call last):
2025-12-04T12:12:57.7239499Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7239693Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7239912Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7239917Z 
2025-12-04T12:12:57.7240126Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7241054Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7241061Z 
2025-12-04T12:12:57.7241336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7241547Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7241672Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7241783Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7242193Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7242430Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7242528Z graph_break []
2025-12-04T12:12:57.7242738Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7243482Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7243586Z   warnings.warn(
2025-12-04T12:12:57.7243813Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7243923Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7244036Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7244270Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7244599Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7244775Z graph_break []
2025-12-04T12:12:57.7245001Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7245719Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7245832Z   warnings.warn(
2025-12-04T12:12:57.7246084Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7246194Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7246326Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7246543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7246873Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7247016Z graph_break []
2025-12-04T12:12:57.7247229Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7247958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7248059Z   warnings.warn(
2025-12-04T12:12:57.7248859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml -
2025-12-04T12:12:57.7249078Z =========================== short test summary info ============================
2025-12-04T12:12:57.7250129Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7250136Z 
2025-12-04T12:12:57.7250366Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7251291Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7251299Z 
2025-12-04T12:12:57.7251564Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7251758Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7251956Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.7252066Z Got exit code 1
2025-12-04T12:12:57.7252916Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7253321Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7253963Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml
2025-12-04T12:12:57.7254127Z ============================= test session starts ==============================
2025-12-04T12:12:57.7254485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7254594Z cachedir: .pytest_cache
2025-12-04T12:12:57.7255101Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7255236Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7255343Z configfile: pytest.ini
2025-12-04T12:12:57.7255920Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7256156Z collecting ... collected 380 items / 64 deselected / 316 selected
2025-12-04T12:12:57.7256295Z stepcurrent: skipping 64 already run items.
2025-12-04T12:12:57.7256419Z Running 111 items in this shard
2025-12-04T12:12:57.7256474Z 
2025-12-04T12:12:57.7257367Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5344s] [  0%]
2025-12-04T12:12:57.7258287Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [  0%]
2025-12-04T12:12:57.7259107Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1580s] [  0%]
2025-12-04T12:12:57.7259112Z 
2025-12-04T12:12:57.7259289Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7259849Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7259970Z Traceback (most recent call last):
2025-12-04T12:12:57.7260432Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7260641Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7260850Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7260887Z 
2025-12-04T12:12:57.7261110Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7262027Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7262032Z 
2025-12-04T12:12:57.7262307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7262522Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7262633Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7262760Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7263090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7263305Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7263419Z graph_break []
2025-12-04T12:12:57.7263634Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7264371Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7264471Z   warnings.warn(
2025-12-04T12:12:57.7265023Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7265159Z Traceback (most recent call last):
2025-12-04T12:12:57.7265618Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7265813Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7266030Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7266035Z 
2025-12-04T12:12:57.7266242Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7267179Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7267186Z 
2025-12-04T12:12:57.7267446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7267658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7267780Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7267892Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7268236Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7268531Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7268628Z graph_break []
2025-12-04T12:12:57.7268854Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7269611Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7269713Z   warnings.warn(
2025-12-04T12:12:57.7269937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7270045Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7270175Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7270426Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7270762Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7270876Z graph_break []
2025-12-04T12:12:57.7271083Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7271794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7271949Z   warnings.warn(
2025-12-04T12:12:57.7272091Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7272651Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7272771Z Traceback (most recent call last):
2025-12-04T12:12:57.7273235Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7273445Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7273655Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7273663Z 
2025-12-04T12:12:57.7273871Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7274810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7274817Z 
2025-12-04T12:12:57.7275078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7275305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7275416Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7275530Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7275878Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7276092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7276207Z graph_break []
2025-12-04T12:12:57.7276417Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7277130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7277250Z   warnings.warn(
2025-12-04T12:12:57.7277462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7277573Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7277700Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7277915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7278257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7278356Z graph_break []
2025-12-04T12:12:57.7278565Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7279337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7279436Z   warnings.warn(
2025-12-04T12:12:57.7279647Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7279805Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7279922Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7280151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7280477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7280574Z graph_break []
2025-12-04T12:12:57.7280801Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7281541Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7281644Z   warnings.warn(
2025-12-04T12:12:57.7282532Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml -
2025-12-04T12:12:57.7282704Z =========================== short test summary info ============================
2025-12-04T12:12:57.7283826Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7283832Z 
2025-12-04T12:12:57.7284044Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7284982Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7284989Z 
2025-12-04T12:12:57.7285248Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7285422Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7285627Z ================== 1 failed, 64 deselected, 2 rerun in 4.91s ===================
2025-12-04T12:12:57.7285728Z Got exit code 1
2025-12-04T12:12:57.7285835Z Retrying single test...
2025-12-04T12:12:57.7286479Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml
2025-12-04T12:12:57.7286636Z ============================= test session starts ==============================
2025-12-04T12:12:57.7286989Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7287100Z cachedir: .pytest_cache
2025-12-04T12:12:57.7287606Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7287741Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7287849Z configfile: pytest.ini
2025-12-04T12:12:57.7288436Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7288663Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7289664Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7289790Z Running 1 items in this shard
2025-12-04T12:12:57.7289795Z 
2025-12-04T12:12:57.7290687Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5461s] [100%]
2025-12-04T12:12:57.7291618Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1594s] [100%]
2025-12-04T12:12:57.7292465Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1576s] [100%]
2025-12-04T12:12:57.7292473Z 
2025-12-04T12:12:57.7292624Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7293166Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7293288Z Traceback (most recent call last):
2025-12-04T12:12:57.7293794Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7293989Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7294201Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7294206Z 
2025-12-04T12:12:57.7294428Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7295353Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7295389Z 
2025-12-04T12:12:57.7295664Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7295880Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7295991Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7296116Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7296449Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7296675Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7296772Z graph_break []
2025-12-04T12:12:57.7296980Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7297713Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7297814Z   warnings.warn(
2025-12-04T12:12:57.7298360Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7298491Z Traceback (most recent call last):
2025-12-04T12:12:57.7298950Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7299157Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7299363Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7299369Z 
2025-12-04T12:12:57.7299577Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7300506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7300515Z 
2025-12-04T12:12:57.7300776Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7301182Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7301296Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7301411Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7301759Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7301976Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7302074Z graph_break []
2025-12-04T12:12:57.7302300Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7303106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7303222Z   warnings.warn(
2025-12-04T12:12:57.7303473Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7303587Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7303715Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7303932Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7304260Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7304370Z graph_break []
2025-12-04T12:12:57.7304631Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7305357Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7305455Z   warnings.warn(
2025-12-04T12:12:57.7305594Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7306155Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7306318Z Traceback (most recent call last):
2025-12-04T12:12:57.7306793Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7306988Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7307194Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7307199Z 
2025-12-04T12:12:57.7307423Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7308353Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7308360Z 
2025-12-04T12:12:57.7308639Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7308854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7308967Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7309098Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7309428Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7309643Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7309751Z graph_break []
2025-12-04T12:12:57.7309966Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7310696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7310798Z   warnings.warn(
2025-12-04T12:12:57.7311008Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7311131Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7311243Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7311464Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7311803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7311899Z graph_break []
2025-12-04T12:12:57.7312106Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7312832Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7312930Z   warnings.warn(
2025-12-04T12:12:57.7313152Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7313294Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7313407Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7313637Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7313993Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7314095Z graph_break []
2025-12-04T12:12:57.7314316Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7315026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7315135Z   warnings.warn(
2025-12-04T12:12:57.7315961Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml -
2025-12-04T12:12:57.7316133Z =========================== short test summary info ============================
2025-12-04T12:12:57.7317189Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7317227Z 
2025-12-04T12:12:57.7317439Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7318371Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7318377Z 
2025-12-04T12:12:57.7318636Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7318823Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7319018Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.7319118Z Got exit code 1
2025-12-04T12:12:57.7319238Z Retrying single test...
2025-12-04T12:12:57.7319862Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml
2025-12-04T12:12:57.7320026Z ============================= test session starts ==============================
2025-12-04T12:12:57.7320377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7320482Z cachedir: .pytest_cache
2025-12-04T12:12:57.7321009Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7321131Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7321241Z configfile: pytest.ini
2025-12-04T12:12:57.7321831Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7322057Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7323151Z stepcurrent: skipping 64 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7323269Z Running 1 items in this shard
2025-12-04T12:12:57.7323274Z 
2025-12-04T12:12:57.7324156Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5599s] [100%]
2025-12-04T12:12:57.7325059Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1637s] [100%]
2025-12-04T12:12:57.7325900Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1593s] [100%]
2025-12-04T12:12:57.7325905Z 
2025-12-04T12:12:57.7326056Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7326638Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7326759Z Traceback (most recent call last):
2025-12-04T12:12:57.7327241Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7327434Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7327699Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7327705Z 
2025-12-04T12:12:57.7327916Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7328840Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7328858Z 
2025-12-04T12:12:57.7329117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7329369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7329492Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7329602Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7329933Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7330160Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7330272Z graph_break []
2025-12-04T12:12:57.7330483Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7331213Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7331314Z   warnings.warn(
2025-12-04T12:12:57.7331873Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7331999Z Traceback (most recent call last):
2025-12-04T12:12:57.7332459Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7332667Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7332873Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7332877Z 
2025-12-04T12:12:57.7333105Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7334024Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7334031Z 
2025-12-04T12:12:57.7334291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7334519Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7334633Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7334762Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7335094Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7335311Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7335423Z graph_break []
2025-12-04T12:12:57.7335634Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7336350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7336502Z   warnings.warn(
2025-12-04T12:12:57.7336716Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7336841Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7336955Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7337205Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7337552Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7337649Z graph_break []
2025-12-04T12:12:57.7337857Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7338585Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7338712Z   warnings.warn(
2025-12-04T12:12:57.7339144Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7340020Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7340145Z Traceback (most recent call last):
2025-12-04T12:12:57.7340632Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7340886Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7341097Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7341117Z 
2025-12-04T12:12:57.7341337Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7342288Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7342294Z 
2025-12-04T12:12:57.7342578Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7342801Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7342914Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7343041Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7343383Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7343618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7343717Z graph_break []
2025-12-04T12:12:57.7343933Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7344809Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7344914Z   warnings.warn(
2025-12-04T12:12:57.7345129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7345251Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7345367Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7345601Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7345940Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7346037Z graph_break []
2025-12-04T12:12:57.7346271Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7346998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7347102Z   warnings.warn(
2025-12-04T12:12:57.7347334Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7347447Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7347578Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7347804Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7348186Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7348298Z graph_break []
2025-12-04T12:12:57.7348517Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7349284Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7349401Z   warnings.warn(
2025-12-04T12:12:57.7350221Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml -
2025-12-04T12:12:57.7350407Z =========================== short test summary info ============================
2025-12-04T12:12:57.7351529Z FAILED [0.1593s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7351538Z 
2025-12-04T12:12:57.7351759Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7352728Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7352765Z 
2025-12-04T12:12:57.7353035Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7353228Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7353428Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.7353534Z Got exit code 1
2025-12-04T12:12:57.7354411Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7354829Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7355479Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml
2025-12-04T12:12:57.7355772Z ============================= test session starts ==============================
2025-12-04T12:12:57.7356113Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7356234Z cachedir: .pytest_cache
2025-12-04T12:12:57.7356739Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7356872Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7356976Z configfile: pytest.ini
2025-12-04T12:12:57.7357549Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7357784Z collecting ... collected 380 items / 65 deselected / 315 selected
2025-12-04T12:12:57.7357923Z stepcurrent: skipping 65 already run items.
2025-12-04T12:12:57.7358033Z Running 110 items in this shard
2025-12-04T12:12:57.7358052Z 
2025-12-04T12:12:57.7359053Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.7359940Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [  1%]
2025-12-04T12:12:57.7360834Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1575s] [  1%]
2025-12-04T12:12:57.7361668Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1578s] [  1%]
2025-12-04T12:12:57.7361674Z 
2025-12-04T12:12:57.7361867Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7362469Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7362593Z Traceback (most recent call last):
2025-12-04T12:12:57.7363071Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7363303Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7363526Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7363534Z 
2025-12-04T12:12:57.7363743Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7364662Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7364713Z 
2025-12-04T12:12:57.7364975Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7365191Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7365318Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7365429Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7365758Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7365986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7366082Z graph_break []
2025-12-04T12:12:57.7366295Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7367031Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7367129Z   warnings.warn(
2025-12-04T12:12:57.7367686Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7367807Z Traceback (most recent call last):
2025-12-04T12:12:57.7368271Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7368474Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7368678Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7368685Z 
2025-12-04T12:12:57.7368905Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7369823Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7369831Z 
2025-12-04T12:12:57.7370090Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7370320Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7370432Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7370558Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7370888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7371101Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7371211Z graph_break []
2025-12-04T12:12:57.7371423Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7372141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7372285Z   warnings.warn(
2025-12-04T12:12:57.7372494Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7372616Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7372781Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7372994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7373335Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7373430Z graph_break []
2025-12-04T12:12:57.7373639Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7374391Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7374489Z   warnings.warn(
2025-12-04T12:12:57.7374643Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7375191Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7375309Z Traceback (most recent call last):
2025-12-04T12:12:57.7375815Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7376008Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7376214Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7376231Z 
2025-12-04T12:12:57.7376437Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7377361Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7377368Z 
2025-12-04T12:12:57.7377638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7377851Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7377960Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7378085Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7378417Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7378644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7378739Z graph_break []
2025-12-04T12:12:57.7378947Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7379679Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7379777Z   warnings.warn(
2025-12-04T12:12:57.7379987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7380107Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7380218Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7380444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7380780Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7380877Z graph_break []
2025-12-04T12:12:57.7381104Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7381816Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7381916Z   warnings.warn(
2025-12-04T12:12:57.7382140Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7382248Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7382406Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7382622Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7382949Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7383057Z graph_break []
2025-12-04T12:12:57.7383293Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7384004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7384114Z   warnings.warn(
2025-12-04T12:12:57.7384913Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml -
2025-12-04T12:12:57.7385122Z =========================== short test summary info ============================
2025-12-04T12:12:57.7386174Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7386182Z 
2025-12-04T12:12:57.7386403Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7387356Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7387361Z 
2025-12-04T12:12:57.7387621Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7387810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7388020Z ============= 1 failed, 1 skipped, 65 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.7388128Z Got exit code 1
2025-12-04T12:12:57.7388233Z Retrying single test...
2025-12-04T12:12:57.7388862Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml
2025-12-04T12:12:57.7389035Z ============================= test session starts ==============================
2025-12-04T12:12:57.7389377Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7389486Z cachedir: .pytest_cache
2025-12-04T12:12:57.7390005Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7390126Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7390247Z configfile: pytest.ini
2025-12-04T12:12:57.7390832Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7391056Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7392074Z stepcurrent: skipping 66 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7392188Z Running 1 items in this shard
2025-12-04T12:12:57.7392195Z 
2025-12-04T12:12:57.7393095Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5396s] [100%]
2025-12-04T12:12:57.7393978Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1596s] [100%]
2025-12-04T12:12:57.7394785Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1592s] [100%]
2025-12-04T12:12:57.7394834Z 
2025-12-04T12:12:57.7394974Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7395523Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7395690Z Traceback (most recent call last):
2025-12-04T12:12:57.7396156Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7396349Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7396571Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7396576Z 
2025-12-04T12:12:57.7396784Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7397754Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7397762Z 
2025-12-04T12:12:57.7398024Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7398239Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7398370Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7398515Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7398865Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7399080Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7399178Z graph_break []
2025-12-04T12:12:57.7399406Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7400130Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7400233Z   warnings.warn(
2025-12-04T12:12:57.7400797Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7401097Z Traceback (most recent call last):
2025-12-04T12:12:57.7401580Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7401778Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7401987Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7401992Z 
2025-12-04T12:12:57.7402268Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7403189Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7403195Z 
2025-12-04T12:12:57.7403473Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7403683Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7403794Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7403921Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7404255Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7404472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7404581Z graph_break []
2025-12-04T12:12:57.7404791Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7405529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7405632Z   warnings.warn(
2025-12-04T12:12:57.7405840Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7406062Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7406172Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7406385Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7406728Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7406869Z graph_break []
2025-12-04T12:12:57.7407092Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7407802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7407898Z   warnings.warn(
2025-12-04T12:12:57.7408096Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7408641Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7408779Z Traceback (most recent call last):
2025-12-04T12:12:57.7409245Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7409437Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7409658Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7409702Z 
2025-12-04T12:12:57.7409911Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7410829Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7410850Z 
2025-12-04T12:12:57.7411111Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7411324Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7411450Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7411562Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7411892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7412118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7412216Z graph_break []
2025-12-04T12:12:57.7412441Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7413161Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7413258Z   warnings.warn(
2025-12-04T12:12:57.7413478Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7413588Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7413699Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7413926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7414256Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7414364Z graph_break []
2025-12-04T12:12:57.7414574Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7415285Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7415396Z   warnings.warn(
2025-12-04T12:12:57.7415603Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7415710Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7415832Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7416046Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7416384Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7416529Z graph_break []
2025-12-04T12:12:57.7416736Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7417454Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7417662Z   warnings.warn(
2025-12-04T12:12:57.7418466Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml -
2025-12-04T12:12:57.7418652Z =========================== short test summary info ============================
2025-12-04T12:12:57.7419737Z FAILED [0.1592s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7419744Z 
2025-12-04T12:12:57.7419978Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7420900Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7420905Z 
2025-12-04T12:12:57.7421217Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7421395Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7421591Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.7421706Z Got exit code 1
2025-12-04T12:12:57.7421814Z Retrying single test...
2025-12-04T12:12:57.7422447Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml
2025-12-04T12:12:57.7422621Z ============================= test session starts ==============================
2025-12-04T12:12:57.7422966Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7423086Z cachedir: .pytest_cache
2025-12-04T12:12:57.7423592Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7423718Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7423842Z configfile: pytest.ini
2025-12-04T12:12:57.7424413Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7424635Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7425652Z stepcurrent: skipping 66 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7425768Z Running 1 items in this shard
2025-12-04T12:12:57.7425773Z 
2025-12-04T12:12:57.7426669Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5591s] [100%]
2025-12-04T12:12:57.7427555Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1620s] [100%]
2025-12-04T12:12:57.7428378Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1589s] [100%]
2025-12-04T12:12:57.7428383Z 
2025-12-04T12:12:57.7428521Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7429061Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7429224Z Traceback (most recent call last):
2025-12-04T12:12:57.7429687Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7429895Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7430135Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7430141Z 
2025-12-04T12:12:57.7430349Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7431287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7431321Z 
2025-12-04T12:12:57.7431583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7431807Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7431919Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7432030Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7432370Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7432585Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7432710Z graph_break []
2025-12-04T12:12:57.7432939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7433657Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7433772Z   warnings.warn(
2025-12-04T12:12:57.7434318Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7434436Z Traceback (most recent call last):
2025-12-04T12:12:57.7434907Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7435099Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7435315Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7435320Z 
2025-12-04T12:12:57.7435532Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7436451Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7436456Z 
2025-12-04T12:12:57.7436726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7436940Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7437061Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7437174Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7437505Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7437732Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7437826Z graph_break []
2025-12-04T12:12:57.7438037Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7438770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7438866Z   warnings.warn(
2025-12-04T12:12:57.7439090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7439197Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7439310Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7439537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7439898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7439993Z graph_break []
2025-12-04T12:12:57.7440217Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7440964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7441081Z   warnings.warn(
2025-12-04T12:12:57.7441224Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7441767Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7441901Z Traceback (most recent call last):
2025-12-04T12:12:57.7442460Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7442655Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7442882Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7442887Z 
2025-12-04T12:12:57.7443096Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7444034Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7444070Z 
2025-12-04T12:12:57.7444331Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7444543Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7444664Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7444779Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7445123Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7445335Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7445431Z graph_break []
2025-12-04T12:12:57.7445652Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7446372Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7446473Z   warnings.warn(
2025-12-04T12:12:57.7446693Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7446803Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7446927Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7447139Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7447470Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7447577Z graph_break []
2025-12-04T12:12:57.7447796Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7448509Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7448619Z   warnings.warn(
2025-12-04T12:12:57.7448829Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7448955Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7449066Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7449284Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7449626Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7449722Z graph_break []
2025-12-04T12:12:57.7449933Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7450658Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7450789Z   warnings.warn(
2025-12-04T12:12:57.7451605Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml -
2025-12-04T12:12:57.7451802Z =========================== short test summary info ============================
2025-12-04T12:12:57.7453004Z FAILED [0.1589s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7453025Z 
2025-12-04T12:12:57.7453239Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7454202Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7454210Z 
2025-12-04T12:12:57.7454488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7454664Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7454874Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.7455027Z Got exit code 1
2025-12-04T12:12:57.7455870Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7456290Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7456922Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml
2025-12-04T12:12:57.7457081Z ============================= test session starts ==============================
2025-12-04T12:12:57.7457442Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7457551Z cachedir: .pytest_cache
2025-12-04T12:12:57.7458077Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7458201Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7458307Z configfile: pytest.ini
2025-12-04T12:12:57.7458892Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7459114Z collecting ... collected 380 items / 67 deselected / 313 selected
2025-12-04T12:12:57.7459267Z stepcurrent: skipping 67 already run items.
2025-12-04T12:12:57.7459380Z Running 108 items in this shard
2025-12-04T12:12:57.7459385Z 
2025-12-04T12:12:57.7460387Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.7461400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0032s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.7462285Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5539s] [  2%]
2025-12-04T12:12:57.7463173Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [  2%]
2025-12-04T12:12:57.7463971Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1565s] [  2%]
2025-12-04T12:12:57.7464011Z 
2025-12-04T12:12:57.7464169Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7464736Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7464859Z Traceback (most recent call last):
2025-12-04T12:12:57.7465334Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7465532Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7465738Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7465785Z 
2025-12-04T12:12:57.7465996Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7466912Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7466919Z 
2025-12-04T12:12:57.7467188Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7467403Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7467543Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7467668Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7467998Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7468224Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7468319Z graph_break []
2025-12-04T12:12:57.7468531Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7469262Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7469366Z   warnings.warn(
2025-12-04T12:12:57.7469909Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7470042Z Traceback (most recent call last):
2025-12-04T12:12:57.7470503Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7470710Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7470914Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7470919Z 
2025-12-04T12:12:57.7471126Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7472057Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7472064Z 
2025-12-04T12:12:57.7472324Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7472557Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7472665Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7472781Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7473124Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7473336Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7473446Z graph_break []
2025-12-04T12:12:57.7473657Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7474378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7474527Z   warnings.warn(
2025-12-04T12:12:57.7474735Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7474841Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7474966Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7475179Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7475549Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7475648Z graph_break []
2025-12-04T12:12:57.7475858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7476582Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7476709Z   warnings.warn(
2025-12-04T12:12:57.7476851Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7477403Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7477522Z Traceback (most recent call last):
2025-12-04T12:12:57.7477993Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7478190Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7478432Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7478437Z 
2025-12-04T12:12:57.7478657Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7479574Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7479580Z 
2025-12-04T12:12:57.7479855Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7480065Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7480174Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7480301Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7480632Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7480849Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7480958Z graph_break []
2025-12-04T12:12:57.7481166Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7481896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7481994Z   warnings.warn(
2025-12-04T12:12:57.7482273Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7482398Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7482513Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7482729Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7483071Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7483167Z graph_break []
2025-12-04T12:12:57.7483393Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7484109Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7484207Z   warnings.warn(
2025-12-04T12:12:57.7484431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7484539Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7484652Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7484883Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7485253Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7485361Z graph_break []
2025-12-04T12:12:57.7485570Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7486309Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7486424Z   warnings.warn(
2025-12-04T12:12:57.7487222Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml -
2025-12-04T12:12:57.7487388Z =========================== short test summary info ============================
2025-12-04T12:12:57.7488481Z FAILED [0.1565s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7488489Z 
2025-12-04T12:12:57.7488703Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7489636Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7489670Z 
2025-12-04T12:12:57.7489932Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7490122Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7490331Z ============= 1 failed, 2 skipped, 67 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:57.7490430Z Got exit code 1
2025-12-04T12:12:57.7490548Z Retrying single test...
2025-12-04T12:12:57.7491172Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml
2025-12-04T12:12:57.7491333Z ============================= test session starts ==============================
2025-12-04T12:12:57.7491687Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7491793Z cachedir: .pytest_cache
2025-12-04T12:12:57.7492311Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7492435Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7492540Z configfile: pytest.ini
2025-12-04T12:12:57.7493131Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7493355Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7494367Z stepcurrent: skipping 69 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7494480Z Running 1 items in this shard
2025-12-04T12:12:57.7494485Z 
2025-12-04T12:12:57.7495370Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5628s] [100%]
2025-12-04T12:12:57.7496262Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1626s] [100%]
2025-12-04T12:12:57.7497059Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1576s] [100%]
2025-12-04T12:12:57.7497065Z 
2025-12-04T12:12:57.7497213Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7497793Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7497924Z Traceback (most recent call last):
2025-12-04T12:12:57.7498420Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7498616Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7498834Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7498839Z 
2025-12-04T12:12:57.7499051Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7500012Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7500031Z 
2025-12-04T12:12:57.7500291Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7500505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7500627Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7500739Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7501247Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7501550Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7501647Z graph_break []
2025-12-04T12:12:57.7501873Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7502599Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7502700Z   warnings.warn(
2025-12-04T12:12:57.7503256Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7503379Z Traceback (most recent call last):
2025-12-04T12:12:57.7503835Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7504045Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7504256Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7504261Z 
2025-12-04T12:12:57.7504484Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7505401Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7505407Z 
2025-12-04T12:12:57.7505667Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7505892Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7506004Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7506133Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7506464Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7506679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7506792Z graph_break []
2025-12-04T12:12:57.7507001Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7507716Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7507827Z   warnings.warn(
2025-12-04T12:12:57.7508037Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7508158Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7508271Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7508540Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7508879Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7508979Z graph_break []
2025-12-04T12:12:57.7509188Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7509957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7510058Z   warnings.warn(
2025-12-04T12:12:57.7510212Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7510752Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7510914Z Traceback (most recent call last):
2025-12-04T12:12:57.7511388Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7511586Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7511792Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7511811Z 
2025-12-04T12:12:57.7512018Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7512967Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7512972Z 
2025-12-04T12:12:57.7513249Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7513462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7513587Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7513702Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7514034Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7514266Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7514362Z graph_break []
2025-12-04T12:12:57.7514575Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7515304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7515407Z   warnings.warn(
2025-12-04T12:12:57.7515631Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7515741Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7515856Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7516089Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7516420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7516521Z graph_break []
2025-12-04T12:12:57.7516748Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7517463Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7517580Z   warnings.warn(
2025-12-04T12:12:57.7517788Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7517896Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7518021Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7518237Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7518566Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7518681Z graph_break []
2025-12-04T12:12:57.7518891Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7519643Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7519753Z   warnings.warn(
2025-12-04T12:12:57.7520585Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml -
2025-12-04T12:12:57.7520770Z =========================== short test summary info ============================
2025-12-04T12:12:57.7521816Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7521822Z 
2025-12-04T12:12:57.7522074Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7523048Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7523057Z 
2025-12-04T12:12:57.7523322Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7523512Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7523747Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.7523858Z Got exit code 1
2025-12-04T12:12:57.7523963Z Retrying single test...
2025-12-04T12:12:57.7524594Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml
2025-12-04T12:12:57.7524772Z ============================= test session starts ==============================
2025-12-04T12:12:57.7525120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7525228Z cachedir: .pytest_cache
2025-12-04T12:12:57.7525751Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7525876Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7525995Z configfile: pytest.ini
2025-12-04T12:12:57.7526577Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7526803Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7527815Z stepcurrent: skipping 69 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7527933Z Running 1 items in this shard
2025-12-04T12:12:57.7527938Z 
2025-12-04T12:12:57.7528832Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5446s] [100%]
2025-12-04T12:12:57.7529716Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%]
2025-12-04T12:12:57.7530520Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1568s] [100%]
2025-12-04T12:12:57.7530540Z 
2025-12-04T12:12:57.7530676Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7531219Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7531350Z Traceback (most recent call last):
2025-12-04T12:12:57.7531849Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7532044Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7532262Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7532267Z 
2025-12-04T12:12:57.7532500Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7533431Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7533436Z 
2025-12-04T12:12:57.7533694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7533937Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7534063Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7534174Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7534519Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7534734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7534828Z graph_break []
2025-12-04T12:12:57.7535053Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7535800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7535901Z   warnings.warn(
2025-12-04T12:12:57.7536450Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7536568Z Traceback (most recent call last):
2025-12-04T12:12:57.7537040Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7537233Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7537442Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7537447Z 
2025-12-04T12:12:57.7537666Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7538581Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7538588Z 
2025-12-04T12:12:57.7538859Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7539072Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7539181Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7539309Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7539639Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7539869Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7539963Z graph_break []
2025-12-04T12:12:57.7540174Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7540908Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7541008Z   warnings.warn(
2025-12-04T12:12:57.7541217Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7541335Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7541447Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7541662Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7542005Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7542101Z graph_break []
2025-12-04T12:12:57.7542323Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7543066Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7543164Z   warnings.warn(
2025-12-04T12:12:57.7543362Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7543906Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7544040Z Traceback (most recent call last):
2025-12-04T12:12:57.7544502Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7544728Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7544948Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7544953Z 
2025-12-04T12:12:57.7545164Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7546079Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7546098Z 
2025-12-04T12:12:57.7546395Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7546605Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7546726Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7546837Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7547165Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7547392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7547488Z graph_break []
2025-12-04T12:12:57.7547711Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7548430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7548529Z   warnings.warn(
2025-12-04T12:12:57.7548753Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7548862Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7548972Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7549194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7549523Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7549633Z graph_break []
2025-12-04T12:12:57.7549844Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7550551Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7550663Z   warnings.warn(
2025-12-04T12:12:57.7550872Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7550980Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7551103Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7551321Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7551660Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7551754Z graph_break []
2025-12-04T12:12:57.7551961Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7552681Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7552780Z   warnings.warn(
2025-12-04T12:12:57.7553578Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml -
2025-12-04T12:12:57.7553792Z =========================== short test summary info ============================
2025-12-04T12:12:57.7554867Z FAILED [0.1568s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7554875Z 
2025-12-04T12:12:57.7555098Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7556038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7556043Z 
2025-12-04T12:12:57.7556317Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7556496Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7556689Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.7556797Z Got exit code 1
2025-12-04T12:12:57.7557633Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7558076Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7558704Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml
2025-12-04T12:12:57.7558867Z ============================= test session starts ==============================
2025-12-04T12:12:57.7559224Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7559331Z cachedir: .pytest_cache
2025-12-04T12:12:57.7559838Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7559972Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7560083Z configfile: pytest.ini
2025-12-04T12:12:57.7560677Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7560898Z collecting ... collected 380 items / 70 deselected / 310 selected
2025-12-04T12:12:57.7561042Z stepcurrent: skipping 70 already run items.
2025-12-04T12:12:57.7561169Z Running 105 items in this shard
2025-12-04T12:12:57.7561174Z 
2025-12-04T12:12:57.7562061Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5778s] [  0%]
2025-12-04T12:12:57.7563017Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1616s] [  0%]
2025-12-04T12:12:57.7563825Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1604s] [  0%]
2025-12-04T12:12:57.7563833Z 
2025-12-04T12:12:57.7563969Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7564520Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7564637Z Traceback (most recent call last):
2025-12-04T12:12:57.7565118Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7565314Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7565560Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7565565Z 
2025-12-04T12:12:57.7565790Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7566736Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7566743Z 
2025-12-04T12:12:57.7567019Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7567233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7567345Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7567500Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7567832Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7568062Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7568158Z graph_break []
2025-12-04T12:12:57.7568368Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7569112Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7569242Z   warnings.warn(
2025-12-04T12:12:57.7569781Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7569911Z Traceback (most recent call last):
2025-12-04T12:12:57.7570374Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7570585Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7570791Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7570798Z 
2025-12-04T12:12:57.7571009Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7571937Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7571948Z 
2025-12-04T12:12:57.7572210Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7572436Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7572544Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7572659Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7573006Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7573223Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7573321Z graph_break []
2025-12-04T12:12:57.7573551Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7574270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7574385Z   warnings.warn(
2025-12-04T12:12:57.7574598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7574708Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7574836Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7575050Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7575380Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7575495Z graph_break []
2025-12-04T12:12:57.7575710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7576435Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7576570Z   warnings.warn(
2025-12-04T12:12:57.7576710Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7577295Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7577417Z Traceback (most recent call last):
2025-12-04T12:12:57.7577879Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7578087Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7578293Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7578298Z 
2025-12-04T12:12:57.7578550Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7579468Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7579475Z 
2025-12-04T12:12:57.7579750Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7579965Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7580105Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7580235Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7580569Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7580786Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7580897Z graph_break []
2025-12-04T12:12:57.7581111Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7581824Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7581940Z   warnings.warn(
2025-12-04T12:12:57.7582148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7582268Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7582380Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7582600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7582938Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7583031Z graph_break []
2025-12-04T12:12:57.7583237Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7583960Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7584059Z   warnings.warn(
2025-12-04T12:12:57.7584280Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7584389Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7584498Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7584722Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7585049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7585147Z graph_break []
2025-12-04T12:12:57.7585364Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7586076Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7586188Z   warnings.warn(
2025-12-04T12:12:57.7586992Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml -
2025-12-04T12:12:57.7587208Z =========================== short test summary info ============================
2025-12-04T12:12:57.7588262Z FAILED [0.1604s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7588300Z 
2025-12-04T12:12:57.7588512Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7589447Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7589452Z 
2025-12-04T12:12:57.7589746Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7589934Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7590127Z ================== 1 failed, 70 deselected, 2 rerun in 4.95s ===================
2025-12-04T12:12:57.7590226Z Got exit code 1
2025-12-04T12:12:57.7590346Z Retrying single test...
2025-12-04T12:12:57.7590973Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml
2025-12-04T12:12:57.7591161Z ============================= test session starts ==============================
2025-12-04T12:12:57.7591514Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7591619Z cachedir: .pytest_cache
2025-12-04T12:12:57.7592137Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7592258Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7592367Z configfile: pytest.ini
2025-12-04T12:12:57.7592956Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7593180Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7594172Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7594300Z Running 1 items in this shard
2025-12-04T12:12:57.7594305Z 
2025-12-04T12:12:57.7595183Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5346s] [100%]
2025-12-04T12:12:57.7596073Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%]
2025-12-04T12:12:57.7596869Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1567s] [100%]
2025-12-04T12:12:57.7596874Z 
2025-12-04T12:12:57.7597024Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7597563Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7597684Z Traceback (most recent call last):
2025-12-04T12:12:57.7598161Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7598354Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7598577Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7598583Z 
2025-12-04T12:12:57.7598791Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7599818Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7599823Z 
2025-12-04T12:12:57.7600103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7600355Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7600481Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7600596Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7601090Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7601326Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7601425Z graph_break []
2025-12-04T12:12:57.7601703Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7602527Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7602632Z   warnings.warn(
2025-12-04T12:12:57.7603187Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7603356Z Traceback (most recent call last):
2025-12-04T12:12:57.7603825Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7604032Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7604240Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7604245Z 
2025-12-04T12:12:57.7604470Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7605382Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7605390Z 
2025-12-04T12:12:57.7605650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7605874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7605989Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7612585Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7613018Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7613243Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7613357Z graph_break []
2025-12-04T12:12:57.7613579Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7614322Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7614442Z   warnings.warn(
2025-12-04T12:12:57.7614656Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7614767Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7614893Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7615112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7615464Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7615562Z graph_break []
2025-12-04T12:12:57.7615772Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7616507Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7616607Z   warnings.warn(
2025-12-04T12:12:57.7616750Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7617430Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7617550Z Traceback (most recent call last):
2025-12-04T12:12:57.7618027Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7618270Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7618481Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7618490Z 
2025-12-04T12:12:57.7618714Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7619670Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7619677Z 
2025-12-04T12:12:57.7619962Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7620180Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7620293Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7620411Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7620744Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7621000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7621109Z graph_break []
2025-12-04T12:12:57.7621319Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7622046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7622146Z   warnings.warn(
2025-12-04T12:12:57.7622354Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7622474Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7622586Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7622799Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7623142Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7623237Z graph_break []
2025-12-04T12:12:57.7623464Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7624183Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7624284Z   warnings.warn(
2025-12-04T12:12:57.7624506Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7624615Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7624725Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7624952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7625280Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7625388Z graph_break []
2025-12-04T12:12:57.7625595Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7626304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7626418Z   warnings.warn(
2025-12-04T12:12:57.7627217Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml -
2025-12-04T12:12:57.7627385Z =========================== short test summary info ============================
2025-12-04T12:12:57.7628454Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7628491Z 
2025-12-04T12:12:57.7628705Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7629667Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7629675Z 
2025-12-04T12:12:57.7629937Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7630120Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7630313Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.7630411Z Got exit code 1
2025-12-04T12:12:57.7630557Z Retrying single test...
2025-12-04T12:12:57.7631186Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml
2025-12-04T12:12:57.7631348Z ============================= test session starts ==============================
2025-12-04T12:12:57.7631701Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7631807Z cachedir: .pytest_cache
2025-12-04T12:12:57.7632381Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7632500Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7632606Z configfile: pytest.ini
2025-12-04T12:12:57.7633196Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7633418Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7634427Z stepcurrent: skipping 70 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7634544Z Running 1 items in this shard
2025-12-04T12:12:57.7634549Z 
2025-12-04T12:12:57.7635438Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5494s] [100%]
2025-12-04T12:12:57.7636338Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1614s] [100%]
2025-12-04T12:12:57.7637140Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1574s] [100%]
2025-12-04T12:12:57.7637146Z 
2025-12-04T12:12:57.7637300Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7637841Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7637975Z Traceback (most recent call last):
2025-12-04T12:12:57.7638436Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7638630Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7638849Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7638854Z 
2025-12-04T12:12:57.7639064Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7639979Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7639996Z 
2025-12-04T12:12:57.7640288Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7640505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7640628Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7640740Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7641105Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7641337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7641436Z graph_break []
2025-12-04T12:12:57.7641659Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7642497Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7642601Z   warnings.warn(
2025-12-04T12:12:57.7643154Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7643277Z Traceback (most recent call last):
2025-12-04T12:12:57.7643741Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7643952Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7644194Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7644199Z 
2025-12-04T12:12:57.7644426Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7645344Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7645349Z 
2025-12-04T12:12:57.7645615Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7645842Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7645957Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7646085Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7646415Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7646631Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7646745Z graph_break []
2025-12-04T12:12:57.7646955Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7647675Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7647786Z   warnings.warn(
2025-12-04T12:12:57.7647998Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7648123Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7648236Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7648451Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7648791Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7648886Z graph_break []
2025-12-04T12:12:57.7649094Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7649818Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7649917Z   warnings.warn(
2025-12-04T12:12:57.7650071Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7650615Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7650733Z Traceback (most recent call last):
2025-12-04T12:12:57.7651207Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7651433Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7651639Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7651657Z 
2025-12-04T12:12:57.7651864Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7652811Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7652817Z 
2025-12-04T12:12:57.7653085Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7653325Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7653449Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7653560Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7653892Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7654122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7654218Z graph_break []
2025-12-04T12:12:57.7654430Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7655192Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7655294Z   warnings.warn(
2025-12-04T12:12:57.7655501Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7655622Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7655731Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7655959Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7656289Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7656386Z graph_break []
2025-12-04T12:12:57.7656610Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7657320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7657420Z   warnings.warn(
2025-12-04T12:12:57.7657643Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7657751Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7657872Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7658085Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7658413Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7658518Z graph_break []
2025-12-04T12:12:57.7658724Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7659434Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7659542Z   warnings.warn(
2025-12-04T12:12:57.7660343Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml -
2025-12-04T12:12:57.7660522Z =========================== short test summary info ============================
2025-12-04T12:12:57.7661568Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7661576Z 
2025-12-04T12:12:57.7661799Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7662715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7662766Z 
2025-12-04T12:12:57.7663029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7663244Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7663440Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.7663552Z Got exit code 1
2025-12-04T12:12:57.7664375Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7664807Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7665446Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml
2025-12-04T12:12:57.7665608Z ============================= test session starts ==============================
2025-12-04T12:12:57.7665962Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7666069Z cachedir: .pytest_cache
2025-12-04T12:12:57.7666609Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7666744Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7666849Z configfile: pytest.ini
2025-12-04T12:12:57.7667426Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7667662Z collecting ... collected 380 items / 71 deselected / 309 selected
2025-12-04T12:12:57.7667801Z stepcurrent: skipping 71 already run items.
2025-12-04T12:12:57.7667927Z Running 104 items in this shard
2025-12-04T12:12:57.7667931Z 
2025-12-04T12:12:57.7668814Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5602s] [  0%]
2025-12-04T12:12:57.7669687Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1665s] [  0%]
2025-12-04T12:12:57.7670494Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1618s] [  0%]
2025-12-04T12:12:57.7670500Z 
2025-12-04T12:12:57.7670639Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7671185Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7671307Z Traceback (most recent call last):
2025-12-04T12:12:57.7671770Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7671976Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7672187Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7672192Z 
2025-12-04T12:12:57.7672415Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7673328Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7673333Z 
2025-12-04T12:12:57.7673592Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7673819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7673963Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7674090Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7674423Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7674639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7674779Z graph_break []
2025-12-04T12:12:57.7674991Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7677699Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7677809Z   return x.grad, w.grad
2025-12-04T12:12:57.7678523Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7678671Z   warnings.warn(
2025-12-04T12:12:57.7681304Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7681424Z   return x.grad, w.grad
2025-12-04T12:12:57.7681955Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7682089Z Traceback (most recent call last):
2025-12-04T12:12:57.7682618Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7682817Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7683038Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7683044Z 
2025-12-04T12:12:57.7683254Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7684180Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7684188Z 
2025-12-04T12:12:57.7684444Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7684652Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7684775Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7684884Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7685219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7685446Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7685541Z graph_break []
2025-12-04T12:12:57.7685762Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7688416Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7688570Z   return x.grad, w.grad
2025-12-04T12:12:57.7689317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7689417Z   warnings.warn(
2025-12-04T12:12:57.7692088Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7692194Z   return x.grad, w.grad
2025-12-04T12:12:57.7692422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7692560Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7692672Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7692905Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7693234Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7693346Z graph_break []
2025-12-04T12:12:57.7693560Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7696200Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7696322Z   return x.grad, w.grad
2025-12-04T12:12:57.7697035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7697144Z   warnings.warn(
2025-12-04T12:12:57.7699805Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7699927Z   return x.grad, w.grad
2025-12-04T12:12:57.7700068Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7700602Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7700733Z Traceback (most recent call last):
2025-12-04T12:12:57.7701486Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7701696Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7701980Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7701985Z 
2025-12-04T12:12:57.7702195Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7703157Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7703167Z 
2025-12-04T12:12:57.7703430Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7703652Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7703761Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7703874Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7704261Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7704480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7704580Z graph_break []
2025-12-04T12:12:57.7704801Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7707454Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7707616Z   return x.grad, w.grad
2025-12-04T12:12:57.7708337Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7708454Z   warnings.warn(
2025-12-04T12:12:57.7711088Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7711211Z   return x.grad, w.grad
2025-12-04T12:12:57.7711426Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7711534Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7711659Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7711877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7712214Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7712325Z graph_break []
2025-12-04T12:12:57.7712537Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7715205Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7715307Z   return x.grad, w.grad
2025-12-04T12:12:57.7716063Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7716160Z   warnings.warn(
2025-12-04T12:12:57.7718852Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7718970Z   return x.grad, w.grad
2025-12-04T12:12:57.7719183Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7719307Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7719417Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7719633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7719975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7720104Z graph_break []
2025-12-04T12:12:57.7720328Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7721042Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7721139Z   warnings.warn(
2025-12-04T12:12:57.7723859Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7723973Z   return x.grad, w.grad
2025-12-04T12:12:57.7724791Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml -
2025-12-04T12:12:57.7724962Z =========================== short test summary info ============================
2025-12-04T12:12:57.7726002Z FAILED [0.1618s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7726025Z 
2025-12-04T12:12:57.7726237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7727147Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7727155Z 
2025-12-04T12:12:57.7727436Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7727613Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7727819Z ================== 1 failed, 71 deselected, 2 rerun in 4.94s ===================
2025-12-04T12:12:57.7727916Z Got exit code 1
2025-12-04T12:12:57.7728019Z Retrying single test...
2025-12-04T12:12:57.7728658Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml
2025-12-04T12:12:57.7728817Z ============================= test session starts ==============================
2025-12-04T12:12:57.7729199Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7729316Z cachedir: .pytest_cache
2025-12-04T12:12:57.7729867Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7730004Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7730108Z configfile: pytest.ini
2025-12-04T12:12:57.7730682Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7730918Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7731938Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7732064Z Running 1 items in this shard
2025-12-04T12:12:57.7732069Z 
2025-12-04T12:12:57.7732939Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5479s] [100%]
2025-12-04T12:12:57.7733842Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1655s] [100%]
2025-12-04T12:12:57.7734646Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1612s] [100%]
2025-12-04T12:12:57.7734652Z 
2025-12-04T12:12:57.7734790Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7735339Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7735458Z Traceback (most recent call last):
2025-12-04T12:12:57.7735920Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7736127Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7736337Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7736342Z 
2025-12-04T12:12:57.7736563Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7737474Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7737482Z 
2025-12-04T12:12:57.7737757Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7737970Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7738085Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7738210Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7738541Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7738761Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7738871Z graph_break []
2025-12-04T12:12:57.7739079Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7741736Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7741875Z   return x.grad, w.grad
2025-12-04T12:12:57.7742620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7742732Z   warnings.warn(
2025-12-04T12:12:57.7745397Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7745519Z   return x.grad, w.grad
2025-12-04T12:12:57.7746053Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7746184Z Traceback (most recent call last):
2025-12-04T12:12:57.7746673Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7746866Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7747084Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7747089Z 
2025-12-04T12:12:57.7747297Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7748221Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7748229Z 
2025-12-04T12:12:57.7748489Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7748701Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7748823Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7748938Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7749282Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7749498Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7749596Z graph_break []
2025-12-04T12:12:57.7749818Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7752464Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7752588Z   return x.grad, w.grad
2025-12-04T12:12:57.7753297Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7753396Z   warnings.warn(
2025-12-04T12:12:57.7756051Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7756190Z   return x.grad, w.grad
2025-12-04T12:12:57.7756444Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7756555Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7756667Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7756898Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7757229Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7757339Z graph_break []
2025-12-04T12:12:57.7757578Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7760228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7760380Z   return x.grad, w.grad
2025-12-04T12:12:57.7761097Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7761209Z   warnings.warn(
2025-12-04T12:12:57.7763908Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7764035Z   return x.grad, w.grad
2025-12-04T12:12:57.7764179Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7764736Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7764857Z Traceback (most recent call last):
2025-12-04T12:12:57.7765319Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7765531Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7765744Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7765750Z 
2025-12-04T12:12:57.7765974Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7766893Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7766901Z 
2025-12-04T12:12:57.7767162Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7767383Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7767493Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7767601Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7767944Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7768161Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7768301Z graph_break []
2025-12-04T12:12:57.7768515Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7771195Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7771340Z   return x.grad, w.grad
2025-12-04T12:12:57.7772056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7772165Z   warnings.warn(
2025-12-04T12:12:57.7774921Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7775087Z   return x.grad, w.grad
2025-12-04T12:12:57.7775305Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7775412Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7775538Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7775755Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7776097Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7776193Z graph_break []
2025-12-04T12:12:57.7776401Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7779057Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7779162Z   return x.grad, w.grad
2025-12-04T12:12:57.7779890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7779986Z   warnings.warn(
2025-12-04T12:12:57.7782638Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7782743Z   return x.grad, w.grad
2025-12-04T12:12:57.7782952Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7783130Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7783242Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7783470Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7783836Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7783934Z graph_break []
2025-12-04T12:12:57.7784157Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7784875Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7784972Z   warnings.warn(
2025-12-04T12:12:57.7787686Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7787819Z   return x.grad, w.grad
2025-12-04T12:12:57.7788639Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml -
2025-12-04T12:12:57.7788805Z =========================== short test summary info ============================
2025-12-04T12:12:57.7789857Z FAILED [0.1612s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7789865Z 
2025-12-04T12:12:57.7790074Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7790999Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7791006Z 
2025-12-04T12:12:57.7791265Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7791440Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7791640Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.7791734Z Got exit code 1
2025-12-04T12:12:57.7791839Z Retrying single test...
2025-12-04T12:12:57.7792470Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml
2025-12-04T12:12:57.7792628Z ============================= test session starts ==============================
2025-12-04T12:12:57.7792978Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7793082Z cachedir: .pytest_cache
2025-12-04T12:12:57.7793589Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7793722Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7793827Z configfile: pytest.ini
2025-12-04T12:12:57.7794405Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7794641Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7795631Z stepcurrent: skipping 71 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7795801Z Running 1 items in this shard
2025-12-04T12:12:57.7795806Z 
2025-12-04T12:12:57.7796717Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5620s] [100%]
2025-12-04T12:12:57.7797602Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1681s] [100%]
2025-12-04T12:12:57.7798400Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1648s] [100%]
2025-12-04T12:12:57.7798435Z 
2025-12-04T12:12:57.7798574Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7799121Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7799240Z Traceback (most recent call last):
2025-12-04T12:12:57.7799715Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7800017Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7800225Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7800230Z 
2025-12-04T12:12:57.7800448Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7801519Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7801525Z 
2025-12-04T12:12:57.7801794Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7802007Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7802180Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7802309Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7802641Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7802871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7802967Z graph_break []
2025-12-04T12:12:57.7803176Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7805839Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7805945Z   return x.grad, w.grad
2025-12-04T12:12:57.7806678Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7806778Z   warnings.warn(
2025-12-04T12:12:57.7809427Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7809611Z   return x.grad, w.grad
2025-12-04T12:12:57.7810144Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7810322Z Traceback (most recent call last):
2025-12-04T12:12:57.7810781Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7810989Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7811194Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7811201Z 
2025-12-04T12:12:57.7811412Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7812380Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7812388Z 
2025-12-04T12:12:57.7812649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7812876Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7812984Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7813140Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7813487Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7813696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7813790Z graph_break []
2025-12-04T12:12:57.7814010Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7816654Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7816774Z   return x.grad, w.grad
2025-12-04T12:12:57.7817491Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7817604Z   warnings.warn(
2025-12-04T12:12:57.7820237Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7820354Z   return x.grad, w.grad
2025-12-04T12:12:57.7820570Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7820680Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7820802Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7821020Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7821350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7821457Z graph_break []
2025-12-04T12:12:57.7821668Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7824342Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7824477Z   return x.grad, w.grad
2025-12-04T12:12:57.7825205Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7825331Z   warnings.warn(
2025-12-04T12:12:57.7827965Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7828123Z   return x.grad, w.grad
2025-12-04T12:12:57.7828263Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7828807Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.7828927Z Traceback (most recent call last):
2025-12-04T12:12:57.7829386Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7829591Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7829797Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7829802Z 
2025-12-04T12:12:57.7830020Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7830938Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7830946Z 
2025-12-04T12:12:57.7831204Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7831425Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7831533Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7831658Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7831988Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7832204Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7832310Z graph_break []
2025-12-04T12:12:57.7832521Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7835179Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7835282Z   return x.grad, w.grad
2025-12-04T12:12:57.7835991Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7836134Z   warnings.warn(
2025-12-04T12:12:57.7838793Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7838908Z   return x.grad, w.grad
2025-12-04T12:12:57.7839151Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7839271Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7839383Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7839602Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7839942Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7840037Z graph_break []
2025-12-04T12:12:57.7840247Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7842990Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7843094Z   return x.grad, w.grad
2025-12-04T12:12:57.7843817Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7843914Z   warnings.warn(
2025-12-04T12:12:57.7846567Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7846670Z   return x.grad, w.grad
2025-12-04T12:12:57.7846884Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7847005Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7847116Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7847347Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7847677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7847774Z graph_break []
2025-12-04T12:12:57.7847992Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7848703Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7848797Z   warnings.warn(
2025-12-04T12:12:57.7851490Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.7851625Z   return x.grad, w.grad
2025-12-04T12:12:57.7852434Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml -
2025-12-04T12:12:57.7852598Z =========================== short test summary info ============================
2025-12-04T12:12:57.7853680Z FAILED [0.1648s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7853689Z 
2025-12-04T12:12:57.7853901Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7854822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7854858Z 
2025-12-04T12:12:57.7855116Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7855286Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7855492Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.7855586Z Got exit code 1
2025-12-04T12:12:57.7856423Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.7856826Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7857448Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml
2025-12-04T12:12:57.7857625Z ============================= test session starts ==============================
2025-12-04T12:12:57.7857968Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7858073Z cachedir: .pytest_cache
2025-12-04T12:12:57.7858593Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7858716Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7858835Z configfile: pytest.ini
2025-12-04T12:12:57.7859412Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7859631Z collecting ... collected 380 items / 72 deselected / 308 selected
2025-12-04T12:12:57.7859783Z stepcurrent: skipping 72 already run items.
2025-12-04T12:12:57.7859894Z Running 103 items in this shard
2025-12-04T12:12:57.7859898Z 
2025-12-04T12:12:57.7860906Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  0%]
2025-12-04T12:12:57.7861899Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.7862875Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0035s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.7863797Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5402s] [  3%]
2025-12-04T12:12:57.7864699Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1601s] [  3%]
2025-12-04T12:12:57.7865510Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1587s] [  3%]
2025-12-04T12:12:57.7865515Z 
2025-12-04T12:12:57.7865681Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7866231Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7866353Z Traceback (most recent call last):
2025-12-04T12:12:57.7866813Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7867019Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7867257Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7867262Z 
2025-12-04T12:12:57.7867483Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7868412Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7868417Z 
2025-12-04T12:12:57.7868685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7868912Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7869023Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7869147Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7869479Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7869696Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7869804Z graph_break []
2025-12-04T12:12:57.7870013Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7870729Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7870841Z   warnings.warn(
2025-12-04T12:12:57.7871381Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7871512Z Traceback (most recent call last):
2025-12-04T12:12:57.7871970Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7872166Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7872384Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7872389Z 
2025-12-04T12:12:57.7872600Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7873532Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7873537Z 
2025-12-04T12:12:57.7873800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7874012Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7874129Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7874242Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7874621Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7874847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7874945Z graph_break []
2025-12-04T12:12:57.7875193Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7875909Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7876007Z   warnings.warn(
2025-12-04T12:12:57.7876223Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7876329Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7876485Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7876714Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7877044Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7877150Z graph_break []
2025-12-04T12:12:57.7877355Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7878064Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7878206Z   warnings.warn(
2025-12-04T12:12:57.7878345Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7878899Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7879016Z Traceback (most recent call last):
2025-12-04T12:12:57.7879476Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7879675Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7879881Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7879886Z 
2025-12-04T12:12:57.7880092Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7881020Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7881028Z 
2025-12-04T12:12:57.7881287Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7881509Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7881623Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7881736Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7882080Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7882362Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7882475Z graph_break []
2025-12-04T12:12:57.7882685Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7883405Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7883519Z   warnings.warn(
2025-12-04T12:12:57.7883728Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7883834Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7883956Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7884169Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7884513Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7884612Z graph_break []
2025-12-04T12:12:57.7884821Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7885590Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7885687Z   warnings.warn(
2025-12-04T12:12:57.7885923Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7886048Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7886161Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7886377Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7886717Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7886811Z graph_break []
2025-12-04T12:12:57.7887062Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7887773Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7887872Z   warnings.warn(
2025-12-04T12:12:57.7888675Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml -
2025-12-04T12:12:57.7888842Z =========================== short test summary info ============================
2025-12-04T12:12:57.7889932Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7889938Z 
2025-12-04T12:12:57.7890152Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7891066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7891086Z 
2025-12-04T12:12:57.7891344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7891519Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7891743Z ============= 1 failed, 3 skipped, 72 deselected, 2 rerun in 4.92s =============
2025-12-04T12:12:57.7891841Z Got exit code 1
2025-12-04T12:12:57.7891942Z Retrying single test...
2025-12-04T12:12:57.7892575Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml
2025-12-04T12:12:57.7892731Z ============================= test session starts ==============================
2025-12-04T12:12:57.7893084Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7893188Z cachedir: .pytest_cache
2025-12-04T12:12:57.7893698Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7893829Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7893933Z configfile: pytest.ini
2025-12-04T12:12:57.7894504Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7894740Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7895738Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7895862Z Running 1 items in this shard
2025-12-04T12:12:57.7895867Z 
2025-12-04T12:12:57.7896754Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5548s] [100%]
2025-12-04T12:12:57.7897678Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%]
2025-12-04T12:12:57.7898517Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1573s] [100%]
2025-12-04T12:12:57.7898525Z 
2025-12-04T12:12:57.7898660Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7899206Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7899354Z Traceback (most recent call last):
2025-12-04T12:12:57.7899831Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7900024Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7900231Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7900236Z 
2025-12-04T12:12:57.7900456Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7901547Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7901621Z 
2025-12-04T12:12:57.7901896Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7902110Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7902220Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7902352Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7902683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7902899Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7903008Z graph_break []
2025-12-04T12:12:57.7903223Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7903958Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7904057Z   warnings.warn(
2025-12-04T12:12:57.7904596Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7904726Z Traceback (most recent call last):
2025-12-04T12:12:57.7905186Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7905390Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7905593Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7905601Z 
2025-12-04T12:12:57.7905808Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7906739Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7906746Z 
2025-12-04T12:12:57.7907003Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7907224Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7907336Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7907449Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7907791Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7908002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7908146Z graph_break []
2025-12-04T12:12:57.7908371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7909086Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7909195Z   warnings.warn(
2025-12-04T12:12:57.7909445Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7909555Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7909681Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7909895Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7910223Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7910373Z graph_break []
2025-12-04T12:12:57.7910583Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7911304Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7911405Z   warnings.warn(
2025-12-04T12:12:57.7911545Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7912090Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7912240Z Traceback (most recent call last):
2025-12-04T12:12:57.7912699Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7912902Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7913109Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7913115Z 
2025-12-04T12:12:57.7913334Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7914245Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7914253Z 
2025-12-04T12:12:57.7914509Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7914737Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7914845Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7914969Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7915298Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7915513Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7915616Z graph_break []
2025-12-04T12:12:57.7915823Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7916536Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7916648Z   warnings.warn(
2025-12-04T12:12:57.7916867Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7916988Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7917102Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7917318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7917661Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7917757Z graph_break []
2025-12-04T12:12:57.7917965Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7918695Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7918796Z   warnings.warn(
2025-12-04T12:12:57.7919056Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7919166Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7919278Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7919508Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7919886Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7919983Z graph_break []
2025-12-04T12:12:57.7920205Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7920919Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7921032Z   warnings.warn(
2025-12-04T12:12:57.7921863Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml -
2025-12-04T12:12:57.7922035Z =========================== short test summary info ============================
2025-12-04T12:12:57.7923172Z FAILED [0.1573s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7923217Z 
2025-12-04T12:12:57.7923429Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7924363Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7924368Z 
2025-12-04T12:12:57.7924631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7924810Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7925024Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.7925124Z Got exit code 1
2025-12-04T12:12:57.7925245Z Retrying single test...
2025-12-04T12:12:57.7925875Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml
2025-12-04T12:12:57.7926038Z ============================= test session starts ==============================
2025-12-04T12:12:57.7926391Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7926499Z cachedir: .pytest_cache
2025-12-04T12:12:57.7927018Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7927139Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7927246Z configfile: pytest.ini
2025-12-04T12:12:57.7927835Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7928059Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7929057Z stepcurrent: skipping 75 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7929184Z Running 1 items in this shard
2025-12-04T12:12:57.7929190Z 
2025-12-04T12:12:57.7930073Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5263s] [100%]
2025-12-04T12:12:57.7930964Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1605s] [100%]
2025-12-04T12:12:57.7931803Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1582s] [100%]
2025-12-04T12:12:57.7931809Z 
2025-12-04T12:12:57.7931960Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7932536Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7932658Z Traceback (most recent call last):
2025-12-04T12:12:57.7933129Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7933322Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7933572Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7933578Z 
2025-12-04T12:12:57.7933787Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7934698Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7934703Z 
2025-12-04T12:12:57.7934976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7935222Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7935343Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7935453Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7935783Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7936011Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7936108Z graph_break []
2025-12-04T12:12:57.7936320Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7937056Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7937152Z   warnings.warn(
2025-12-04T12:12:57.7937702Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7937824Z Traceback (most recent call last):
2025-12-04T12:12:57.7938282Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7938487Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7938692Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7938697Z 
2025-12-04T12:12:57.7938906Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7939834Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7939841Z 
2025-12-04T12:12:57.7940103Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7940327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7940440Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7940554Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7940894Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7941108Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7941215Z graph_break []
2025-12-04T12:12:57.7941427Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7942142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7942302Z   warnings.warn(
2025-12-04T12:12:57.7942513Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7942623Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7942744Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7942986Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7943332Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7943424Z graph_break []
2025-12-04T12:12:57.7943633Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7944390Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7944490Z   warnings.warn(
2025-12-04T12:12:57.7944629Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7945186Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.7945303Z Traceback (most recent call last):
2025-12-04T12:12:57.7945775Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7946022Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7946232Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7946237Z 
2025-12-04T12:12:57.7946458Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7947378Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7947383Z 
2025-12-04T12:12:57.7947654Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7947865Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7947973Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7948098Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7948429Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7948658Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7948754Z graph_break []
2025-12-04T12:12:57.7948964Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7949696Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7949794Z   warnings.warn(
2025-12-04T12:12:57.7950005Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7950128Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7950238Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7950449Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7950789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7950889Z graph_break []
2025-12-04T12:12:57.7951111Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7951820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7951916Z   warnings.warn(
2025-12-04T12:12:57.7952137Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7952247Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7952358Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7952581Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7952943Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7953052Z graph_break []
2025-12-04T12:12:57.7953259Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7953996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7954105Z   warnings.warn(
2025-12-04T12:12:57.7954903Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml -
2025-12-04T12:12:57.7955113Z =========================== short test summary info ============================
2025-12-04T12:12:57.7956156Z FAILED [0.1582s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7956164Z 
2025-12-04T12:12:57.7956375Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7957304Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7957341Z 
2025-12-04T12:12:57.7957601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7957790Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7957984Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.7958081Z Got exit code 1
2025-12-04T12:12:57.7958923Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.7959322Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.7959957Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml
2025-12-04T12:12:57.7960120Z ============================= test session starts ==============================
2025-12-04T12:12:57.7960460Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7960577Z cachedir: .pytest_cache
2025-12-04T12:12:57.7961083Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7961214Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7961319Z configfile: pytest.ini
2025-12-04T12:12:57.7961894Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7962195Z collecting ... collected 380 items / 76 deselected / 304 selected
2025-12-04T12:12:57.7962339Z stepcurrent: skipping 76 already run items.
2025-12-04T12:12:57.7962449Z Running 99 items in this shard
2025-12-04T12:12:57.7962460Z 
2025-12-04T12:12:57.7963359Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5334s] [  1%]
2025-12-04T12:12:57.7964234Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1607s] [  1%]
2025-12-04T12:12:57.7965044Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1567s] [  1%]
2025-12-04T12:12:57.7965110Z 
2025-12-04T12:12:57.7965247Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7965797Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7965949Z Traceback (most recent call last):
2025-12-04T12:12:57.7966410Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7966616Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7966823Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7966828Z 
2025-12-04T12:12:57.7967081Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7968007Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7968014Z 
2025-12-04T12:12:57.7968275Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7968501Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7968643Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7968756Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7969102Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7969318Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7969431Z graph_break []
2025-12-04T12:12:57.7969642Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7970365Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7970477Z   warnings.warn(
2025-12-04T12:12:57.7971012Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7971141Z Traceback (most recent call last):
2025-12-04T12:12:57.7971605Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7971800Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7972017Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7972022Z 
2025-12-04T12:12:57.7972226Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7973140Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7973160Z 
2025-12-04T12:12:57.7973419Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7973630Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7973748Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7973858Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7974193Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7974418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7974516Z graph_break []
2025-12-04T12:12:57.7974743Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7975459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7975557Z   warnings.warn(
2025-12-04T12:12:57.7975775Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7975923Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7976033Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7976260Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7976618Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7976733Z graph_break []
2025-12-04T12:12:57.7976942Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7977650Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7977762Z   warnings.warn(
2025-12-04T12:12:57.7977932Z =================================== FAILURES ===================================
2025-12-04T12:12:57.7978478Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7978612Z Traceback (most recent call last):
2025-12-04T12:12:57.7979074Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7979284Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7979522Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7979528Z 
2025-12-04T12:12:57.7979737Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7980670Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7980675Z 
2025-12-04T12:12:57.7980940Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7981167Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7981281Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7981394Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7981741Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7981954Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7982054Z graph_break []
2025-12-04T12:12:57.7982278Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7982999Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7983116Z   warnings.warn(
2025-12-04T12:12:57.7983327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7983436Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7983561Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7983774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7984106Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7984220Z graph_break []
2025-12-04T12:12:57.7984430Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7985158Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7985257Z   warnings.warn(
2025-12-04T12:12:57.7985465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.7985588Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.7985702Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.7985917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.7986257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.7986391Z graph_break []
2025-12-04T12:12:57.7986614Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.7987323Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.7987453Z   warnings.warn(
2025-12-04T12:12:57.7988270Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml -
2025-12-04T12:12:57.7988437Z =========================== short test summary info ============================
2025-12-04T12:12:57.7989605Z FAILED [0.1567s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7989614Z 
2025-12-04T12:12:57.7989826Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.7990737Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7990775Z 
2025-12-04T12:12:57.7991051Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.7991224Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.7991431Z ================== 1 failed, 76 deselected, 2 rerun in 4.90s ===================
2025-12-04T12:12:57.7991528Z Got exit code 1
2025-12-04T12:12:57.7991632Z Retrying single test...
2025-12-04T12:12:57.7992274Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml
2025-12-04T12:12:57.7992433Z ============================= test session starts ==============================
2025-12-04T12:12:57.7992776Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.7992896Z cachedir: .pytest_cache
2025-12-04T12:12:57.7993405Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.7993540Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.7993648Z configfile: pytest.ini
2025-12-04T12:12:57.7994224Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.7994460Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.7995459Z stepcurrent: skipping 76 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.7995589Z Running 1 items in this shard
2025-12-04T12:12:57.7995593Z 
2025-12-04T12:12:57.7996472Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5653s] [100%]
2025-12-04T12:12:57.7997353Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%]
2025-12-04T12:12:57.7998162Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1579s] [100%]
2025-12-04T12:12:57.7998167Z 
2025-12-04T12:12:57.7998311Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.7998862Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.7999018Z Traceback (most recent call last):
2025-12-04T12:12:57.7999493Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.7999712Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.7999922Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.7999926Z 
2025-12-04T12:12:57.8000147Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8001234Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8001324Z 
2025-12-04T12:12:57.8001601Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8001815Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8001927Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8002053Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8002446Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8002665Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8002840Z graph_break []
2025-12-04T12:12:57.8003050Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8003785Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8003883Z   warnings.warn(
2025-12-04T12:12:57.8004420Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8004551Z Traceback (most recent call last):
2025-12-04T12:12:57.8005011Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8005202Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8005423Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8005430Z 
2025-12-04T12:12:57.8005639Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8006560Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8006565Z 
2025-12-04T12:12:57.8006824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8007036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8007165Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8007278Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8007620Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8007833Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8007929Z graph_break []
2025-12-04T12:12:57.8008154Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8008870Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8008968Z   warnings.warn(
2025-12-04T12:12:57.8009189Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8009295Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8009420Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8009633Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8010008Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8010117Z graph_break []
2025-12-04T12:12:57.8010327Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8011079Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8011192Z   warnings.warn(
2025-12-04T12:12:57.8011330Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8011879Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8012001Z Traceback (most recent call last):
2025-12-04T12:12:57.8012491Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8012699Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8012909Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8012914Z 
2025-12-04T12:12:57.8013134Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8014051Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8014085Z 
2025-12-04T12:12:57.8014344Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8014565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8014673Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8014785Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8015126Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8015341Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8015450Z graph_break []
2025-12-04T12:12:57.8015658Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8016374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8016488Z   warnings.warn(
2025-12-04T12:12:57.8016696Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8016816Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8016927Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8017140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8017485Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8017579Z graph_break []
2025-12-04T12:12:57.8017785Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8018504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8018600Z   warnings.warn(
2025-12-04T12:12:57.8018809Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8018930Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8019040Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8019264Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8019589Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8019682Z graph_break []
2025-12-04T12:12:57.8019904Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8020609Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8020745Z   warnings.warn(
2025-12-04T12:12:57.8021557Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml -
2025-12-04T12:12:57.8022318Z =========================== short test summary info ============================
2025-12-04T12:12:57.8023385Z FAILED [0.1579s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8023391Z 
2025-12-04T12:12:57.8023599Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8024563Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8024573Z 
2025-12-04T12:12:57.8024832Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8025006Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8025217Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8025346Z Got exit code 1
2025-12-04T12:12:57.8025463Z Retrying single test...
2025-12-04T12:12:57.8026088Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml
2025-12-04T12:12:57.8026250Z ============================= test session starts ==============================
2025-12-04T12:12:57.8026607Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8026714Z cachedir: .pytest_cache
2025-12-04T12:12:57.8027224Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8027361Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8027466Z configfile: pytest.ini
2025-12-04T12:12:57.8028058Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8028282Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8029277Z stepcurrent: skipping 76 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8029400Z Running 1 items in this shard
2025-12-04T12:12:57.8029407Z 
2025-12-04T12:12:57.8030285Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5502s] [100%]
2025-12-04T12:12:57.8031170Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1603s] [100%]
2025-12-04T12:12:57.8031968Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1563s] [100%]
2025-12-04T12:12:57.8031975Z 
2025-12-04T12:12:57.8032123Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8032663Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8032783Z Traceback (most recent call last):
2025-12-04T12:12:57.8033256Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8033504Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8033708Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8033725Z 
2025-12-04T12:12:57.8033932Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8034879Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8034884Z 
2025-12-04T12:12:57.8035156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8035369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8035506Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8035634Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8035968Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8036206Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8036301Z graph_break []
2025-12-04T12:12:57.8036511Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8037245Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8037377Z   warnings.warn(
2025-12-04T12:12:57.8037911Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8038043Z Traceback (most recent call last):
2025-12-04T12:12:57.8038504Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8038712Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8038921Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8038926Z 
2025-12-04T12:12:57.8039137Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8040066Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8040073Z 
2025-12-04T12:12:57.8040332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8040558Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8040669Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8040784Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8041134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8041351Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8041463Z graph_break []
2025-12-04T12:12:57.8041672Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8042467Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8042588Z   warnings.warn(
2025-12-04T12:12:57.8042797Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8042903Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8043034Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8043248Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8043576Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8043690Z graph_break []
2025-12-04T12:12:57.8043903Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8044674Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8044774Z   warnings.warn(
2025-12-04T12:12:57.8044915Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8045504Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8045627Z Traceback (most recent call last):
2025-12-04T12:12:57.8046100Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8046293Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8046542Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8046548Z 
2025-12-04T12:12:57.8046768Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8047683Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8047688Z 
2025-12-04T12:12:57.8047958Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8048200Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8048310Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8048435Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8048764Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8048975Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8049086Z graph_break []
2025-12-04T12:12:57.8049294Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8050023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8050124Z   warnings.warn(
2025-12-04T12:12:57.8050332Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8050451Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8050566Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8050779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8051117Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8051211Z graph_break []
2025-12-04T12:12:57.8051433Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8052143Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8052242Z   warnings.warn(
2025-12-04T12:12:57.8052462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8052569Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8052683Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8052906Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8053238Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8053353Z graph_break []
2025-12-04T12:12:57.8053560Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8054270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8054384Z   warnings.warn(
2025-12-04T12:12:57.8055180Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml -
2025-12-04T12:12:57.8055387Z =========================== short test summary info ============================
2025-12-04T12:12:57.8056471Z FAILED [0.1563s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8056479Z 
2025-12-04T12:12:57.8056689Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8057620Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8057626Z 
2025-12-04T12:12:57.8057912Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8058103Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8058297Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.8058394Z Got exit code 1
2025-12-04T12:12:57.8059239Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8059669Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8060300Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml
2025-12-04T12:12:57.8060458Z ============================= test session starts ==============================
2025-12-04T12:12:57.8060797Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8060915Z cachedir: .pytest_cache
2025-12-04T12:12:57.8061421Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8061543Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8061660Z configfile: pytest.ini
2025-12-04T12:12:57.8062237Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8062470Z collecting ... collected 380 items / 77 deselected / 303 selected
2025-12-04T12:12:57.8062610Z stepcurrent: skipping 77 already run items.
2025-12-04T12:12:57.8062720Z Running 98 items in this shard
2025-12-04T12:12:57.8062726Z 
2025-12-04T12:12:57.8063731Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.8064717Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.8065610Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5611s] [  3%]
2025-12-04T12:12:57.8066489Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1595s] [  3%]
2025-12-04T12:12:57.8067303Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1587s] [  3%]
2025-12-04T12:12:57.8067308Z 
2025-12-04T12:12:57.8067446Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8068018Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8068150Z Traceback (most recent call last):
2025-12-04T12:12:57.8068639Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8068851Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8069058Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8069063Z 
2025-12-04T12:12:57.8069270Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8070242Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8070248Z 
2025-12-04T12:12:57.8070507Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8070736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8070844Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8070955Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8071299Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8071545Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8071640Z graph_break []
2025-12-04T12:12:57.8071865Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8072584Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8072696Z   warnings.warn(
2025-12-04T12:12:57.8073233Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8073353Z Traceback (most recent call last):
2025-12-04T12:12:57.8073826Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8074015Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8074224Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8074242Z 
2025-12-04T12:12:57.8074451Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8075362Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8075368Z 
2025-12-04T12:12:57.8075640Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8075852Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8075976Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8076087Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8076420Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8076644Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8076745Z graph_break []
2025-12-04T12:12:57.8076953Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8077677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8077776Z   warnings.warn(
2025-12-04T12:12:57.8077984Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8078105Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8078217Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8078478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8078804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8078899Z graph_break []
2025-12-04T12:12:57.8079120Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8079864Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8079963Z   warnings.warn(
2025-12-04T12:12:57.8080113Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8080681Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8080814Z Traceback (most recent call last):
2025-12-04T12:12:57.8081274Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8081469Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8081685Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8081690Z 
2025-12-04T12:12:57.8081895Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8082923Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8082930Z 
2025-12-04T12:12:57.8083189Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8083397Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8083521Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8083632Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8083973Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8084189Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8084284Z graph_break []
2025-12-04T12:12:57.8084505Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8085218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8085317Z   warnings.warn(
2025-12-04T12:12:57.8085537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8085645Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8085772Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8085988Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8086314Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8086432Z graph_break []
2025-12-04T12:12:57.8086641Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8087351Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8087468Z   warnings.warn(
2025-12-04T12:12:57.8087675Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8087795Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8087906Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8088121Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8088459Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8088556Z graph_break []
2025-12-04T12:12:57.8088765Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8089532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8089629Z   warnings.warn(
2025-12-04T12:12:57.8090476Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml -
2025-12-04T12:12:57.8090646Z =========================== short test summary info ============================
2025-12-04T12:12:57.8091687Z FAILED [0.1587s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8091693Z 
2025-12-04T12:12:57.8091948Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8092864Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8092872Z 
2025-12-04T12:12:57.8093140Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8093316Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8093558Z ============= 1 failed, 2 skipped, 77 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:57.8093669Z Got exit code 1
2025-12-04T12:12:57.8093776Z Retrying single test...
2025-12-04T12:12:57.8094414Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml
2025-12-04T12:12:57.8094576Z ============================= test session starts ==============================
2025-12-04T12:12:57.8094913Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8095039Z cachedir: .pytest_cache
2025-12-04T12:12:57.8095547Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8095666Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8095785Z configfile: pytest.ini
2025-12-04T12:12:57.8096365Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8096599Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8097597Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8097710Z Running 1 items in this shard
2025-12-04T12:12:57.8097714Z 
2025-12-04T12:12:57.8098608Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5419s] [100%]
2025-12-04T12:12:57.8099485Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [100%]
2025-12-04T12:12:57.8100308Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [100%]
2025-12-04T12:12:57.8100313Z 
2025-12-04T12:12:57.8100457Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8101223Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8101347Z Traceback (most recent call last):
2025-12-04T12:12:57.8101889Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8102102Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8102313Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8102318Z 
2025-12-04T12:12:57.8102583Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8103506Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8103511Z 
2025-12-04T12:12:57.8103773Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8104044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8104160Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8104275Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8104623Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8104844Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8104959Z graph_break []
2025-12-04T12:12:57.8105173Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8105938Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8106055Z   warnings.warn(
2025-12-04T12:12:57.8106595Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8106731Z Traceback (most recent call last):
2025-12-04T12:12:57.8107194Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8107389Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8107608Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8107613Z 
2025-12-04T12:12:57.8107819Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8108735Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8108755Z 
2025-12-04T12:12:57.8109013Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8109223Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8109346Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8109460Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8109789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8110015Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8110109Z graph_break []
2025-12-04T12:12:57.8110329Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8111046Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8111145Z   warnings.warn(
2025-12-04T12:12:57.8111364Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8111470Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8111580Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8111805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8112132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8112239Z graph_break []
2025-12-04T12:12:57.8112480Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8113185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8113294Z   warnings.warn(
2025-12-04T12:12:57.8113462Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8114007Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8114136Z Traceback (most recent call last):
2025-12-04T12:12:57.8114595Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8114830Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8115036Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8115041Z 
2025-12-04T12:12:57.8115249Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8116177Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8116182Z 
2025-12-04T12:12:57.8116474Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8116697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8116806Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8116939Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8117402Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8117621Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8117716Z graph_break []
2025-12-04T12:12:57.8117939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8118652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8118764Z   warnings.warn(
2025-12-04T12:12:57.8118975Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8119085Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8119209Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8119425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8119755Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8119864Z graph_break []
2025-12-04T12:12:57.8120073Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8120794Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8120893Z   warnings.warn(
2025-12-04T12:12:57.8121100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8121218Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8121330Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8121548Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8121888Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8121982Z graph_break []
2025-12-04T12:12:57.8122263Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8122977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8123077Z   warnings.warn(
2025-12-04T12:12:57.8123889Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml -
2025-12-04T12:12:57.8124124Z =========================== short test summary info ============================
2025-12-04T12:12:57.8125231Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8125240Z 
2025-12-04T12:12:57.8125452Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8126400Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8126420Z 
2025-12-04T12:12:57.8126681Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8126859Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8127067Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:57.8127166Z Got exit code 1
2025-12-04T12:12:57.8127271Z Retrying single test...
2025-12-04T12:12:57.8127954Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml
2025-12-04T12:12:57.8128112Z ============================= test session starts ==============================
2025-12-04T12:12:57.8128464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8128569Z cachedir: .pytest_cache
2025-12-04T12:12:57.8129078Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8129210Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8129318Z configfile: pytest.ini
2025-12-04T12:12:57.8129891Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8130125Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8131126Z stepcurrent: skipping 79 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8131253Z Running 1 items in this shard
2025-12-04T12:12:57.8131258Z 
2025-12-04T12:12:57.8132142Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5722s] [100%]
2025-12-04T12:12:57.8133010Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%]
2025-12-04T12:12:57.8133824Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1586s] [100%]
2025-12-04T12:12:57.8133833Z 
2025-12-04T12:12:57.8133970Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8134520Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8134640Z Traceback (most recent call last):
2025-12-04T12:12:57.8135114Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8135309Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8135515Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8135581Z 
2025-12-04T12:12:57.8135804Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8136747Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8136755Z 
2025-12-04T12:12:57.8137029Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8137244Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8137353Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8137478Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8137841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8138055Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8138167Z graph_break []
2025-12-04T12:12:57.8138378Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8139111Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8139208Z   warnings.warn(
2025-12-04T12:12:57.8139782Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8139915Z Traceback (most recent call last):
2025-12-04T12:12:57.8140374Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8140563Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8140784Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8140789Z 
2025-12-04T12:12:57.8140995Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8141924Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8141929Z 
2025-12-04T12:12:57.8142191Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8142404Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8142523Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8142634Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8142975Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8143187Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8143285Z graph_break []
2025-12-04T12:12:57.8143506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8144221Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8144317Z   warnings.warn(
2025-12-04T12:12:57.8144537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8144647Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8144774Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8144984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8145310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8145418Z graph_break []
2025-12-04T12:12:57.8145629Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8146345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8146498Z   warnings.warn(
2025-12-04T12:12:57.8146640Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8147190Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8147314Z Traceback (most recent call last):
2025-12-04T12:12:57.8147806Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8148011Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8148219Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8148225Z 
2025-12-04T12:12:57.8148446Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8149394Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8149402Z 
2025-12-04T12:12:57.8149662Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8149883Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8149991Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8150145Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8150477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8150690Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8150798Z graph_break []
2025-12-04T12:12:57.8151007Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8151720Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8151832Z   warnings.warn(
2025-12-04T12:12:57.8152044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8152161Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8152274Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8152487Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8152849Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8152945Z graph_break []
2025-12-04T12:12:57.8153153Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8153878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8153979Z   warnings.warn(
2025-12-04T12:12:57.8154205Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8154314Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8154427Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8154656Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8154985Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8155080Z graph_break []
2025-12-04T12:12:57.8155307Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8156015Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8156127Z   warnings.warn(
2025-12-04T12:12:57.8156934Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml -
2025-12-04T12:12:57.8157101Z =========================== short test summary info ============================
2025-12-04T12:12:57.8158195Z FAILED [0.1586s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8158201Z 
2025-12-04T12:12:57.8158448Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8159382Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8159387Z 
2025-12-04T12:12:57.8159647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8159854Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8160065Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8160164Z Got exit code 1
2025-12-04T12:12:57.8161012Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8161412Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8162075Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml
2025-12-04T12:12:57.8162319Z ============================= test session starts ==============================
2025-12-04T12:12:57.8162664Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8162787Z cachedir: .pytest_cache
2025-12-04T12:12:57.8163299Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8163419Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8163545Z configfile: pytest.ini
2025-12-04T12:12:57.8164118Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8164339Z collecting ... collected 380 items / 80 deselected / 300 selected
2025-12-04T12:12:57.8164497Z stepcurrent: skipping 80 already run items.
2025-12-04T12:12:57.8164610Z Running 95 items in this shard
2025-12-04T12:12:57.8164616Z 
2025-12-04T12:12:57.8165519Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5436s] [  1%]
2025-12-04T12:12:57.8166394Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1600s] [  1%]
2025-12-04T12:12:57.8167197Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1558s] [  1%]
2025-12-04T12:12:57.8167216Z 
2025-12-04T12:12:57.8167356Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8167903Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8168042Z Traceback (most recent call last):
2025-12-04T12:12:57.8168503Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8168699Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8168925Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8168929Z 
2025-12-04T12:12:57.8169139Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8170110Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8170115Z 
2025-12-04T12:12:57.8170374Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8170696Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8170807Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8170916Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8171262Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8171473Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8171598Z graph_break []
2025-12-04T12:12:57.8171823Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8172539Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8172640Z   warnings.warn(
2025-12-04T12:12:57.8173189Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8173360Z Traceback (most recent call last):
2025-12-04T12:12:57.8173838Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8174027Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8174232Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8174237Z 
2025-12-04T12:12:57.8174460Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8175379Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8175386Z 
2025-12-04T12:12:57.8175656Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8175866Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8175979Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8176102Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8176432Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8176657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8176753Z graph_break []
2025-12-04T12:12:57.8176964Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8177694Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8177792Z   warnings.warn(
2025-12-04T12:12:57.8178003Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8178122Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8178232Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8178460Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8178789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8178883Z graph_break []
2025-12-04T12:12:57.8179101Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8179808Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8179907Z   warnings.warn(
2025-12-04T12:12:57.8180058Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8180627Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8180757Z Traceback (most recent call last):
2025-12-04T12:12:57.8181216Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8181442Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8181665Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8181670Z 
2025-12-04T12:12:57.8181878Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8182832Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8182838Z 
2025-12-04T12:12:57.8183096Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8183309Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8183431Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8183542Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8183870Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8184129Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8184224Z graph_break []
2025-12-04T12:12:57.8184445Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8185160Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8185260Z   warnings.warn(
2025-12-04T12:12:57.8185481Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8185590Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8185702Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8185928Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8186257Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8186365Z graph_break []
2025-12-04T12:12:57.8186576Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8187282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8187392Z   warnings.warn(
2025-12-04T12:12:57.8187599Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8187709Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8187831Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8188043Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8188385Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8188479Z graph_break []
2025-12-04T12:12:57.8188685Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8189407Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8189506Z   warnings.warn(
2025-12-04T12:12:57.8190304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml -
2025-12-04T12:12:57.8190483Z =========================== short test summary info ============================
2025-12-04T12:12:57.8191524Z FAILED [0.1558s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8191569Z 
2025-12-04T12:12:57.8191792Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8192736Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8192744Z 
2025-12-04T12:12:57.8193018Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8193193Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8193383Z ================== 1 failed, 80 deselected, 2 rerun in 4.91s ===================
2025-12-04T12:12:57.8193522Z Got exit code 1
2025-12-04T12:12:57.8193628Z Retrying single test...
2025-12-04T12:12:57.8194252Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml
2025-12-04T12:12:57.8194423Z ============================= test session starts ==============================
2025-12-04T12:12:57.8194765Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8194887Z cachedir: .pytest_cache
2025-12-04T12:12:57.8195427Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8195545Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8195662Z configfile: pytest.ini
2025-12-04T12:12:57.8196238Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8196475Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8197466Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8197580Z Running 1 items in this shard
2025-12-04T12:12:57.8197585Z 
2025-12-04T12:12:57.8198481Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5605s] [100%]
2025-12-04T12:12:57.8199357Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1647s] [100%]
2025-12-04T12:12:57.8200169Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1576s] [100%]
2025-12-04T12:12:57.8200174Z 
2025-12-04T12:12:57.8200310Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8201049Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8201171Z Traceback (most recent call last):
2025-12-04T12:12:57.8201635Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8201844Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8202050Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8202054Z 
2025-12-04T12:12:57.8202358Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8203292Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8203298Z 
2025-12-04T12:12:57.8203649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8203875Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8203987Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8204101Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8204489Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8204711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8204826Z graph_break []
2025-12-04T12:12:57.8205036Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8205796Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8205908Z   warnings.warn(
2025-12-04T12:12:57.8206443Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8206566Z Traceback (most recent call last):
2025-12-04T12:12:57.8207042Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8207233Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8207494Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8207499Z 
2025-12-04T12:12:57.8207705Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8208620Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8208625Z 
2025-12-04T12:12:57.8208898Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8209109Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8209232Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8209344Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8209677Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8215216Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8215357Z graph_break []
2025-12-04T12:12:57.8215587Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8216332Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8216436Z   warnings.warn(
2025-12-04T12:12:57.8216658Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8216788Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8216904Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8217144Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8217478Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8217580Z graph_break []
2025-12-04T12:12:57.8217811Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8218526Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8218626Z   warnings.warn(
2025-12-04T12:12:57.8218783Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8219328Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8219465Z Traceback (most recent call last):
2025-12-04T12:12:57.8219928Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8220213Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8220438Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8220445Z 
2025-12-04T12:12:57.8220697Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8221637Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8221643Z 
2025-12-04T12:12:57.8221909Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8222177Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8222303Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8222417Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8222754Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8222966Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8223067Z graph_break []
2025-12-04T12:12:57.8223290Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8224041Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8224140Z   warnings.warn(
2025-12-04T12:12:57.8224360Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8224468Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8224579Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8224805Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8225133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8225244Z graph_break []
2025-12-04T12:12:57.8225452Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8226162Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8226276Z   warnings.warn(
2025-12-04T12:12:57.8226483Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8226590Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8226712Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8226929Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8227271Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8227366Z graph_break []
2025-12-04T12:12:57.8227574Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8228302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8228398Z   warnings.warn(
2025-12-04T12:12:57.8229200Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml -
2025-12-04T12:12:57.8229383Z =========================== short test summary info ============================
2025-12-04T12:12:57.8230437Z FAILED [0.1576s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8230446Z 
2025-12-04T12:12:57.8230669Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8231583Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8231623Z 
2025-12-04T12:12:57.8231899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8232101Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8232301Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8232408Z Got exit code 1
2025-12-04T12:12:57.8232510Z Retrying single test...
2025-12-04T12:12:57.8233139Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml
2025-12-04T12:12:57.8233343Z ============================= test session starts ==============================
2025-12-04T12:12:57.8233691Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8233812Z cachedir: .pytest_cache
2025-12-04T12:12:57.8234318Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8234438Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8234555Z configfile: pytest.ini
2025-12-04T12:12:57.8235161Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8235385Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8236393Z stepcurrent: skipping 80 already run items. Running only test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8236505Z Running 1 items in this shard
2025-12-04T12:12:57.8236510Z 
2025-12-04T12:12:57.8237412Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5337s] [100%]
2025-12-04T12:12:57.8238294Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1592s] [100%]
2025-12-04T12:12:57.8239110Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [100%]
2025-12-04T12:12:57.8239116Z 
2025-12-04T12:12:57.8239255Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8239794Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8239927Z Traceback (most recent call last):
2025-12-04T12:12:57.8240390Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8240598Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8240803Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8240809Z 
2025-12-04T12:12:57.8241021Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8241942Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8241947Z 
2025-12-04T12:12:57.8242296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8242533Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8242649Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8242807Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8243160Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8243374Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8243473Z graph_break []
2025-12-04T12:12:57.8243733Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8244459Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8244570Z   warnings.warn(
2025-12-04T12:12:57.8245111Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8245260Z Traceback (most recent call last):
2025-12-04T12:12:57.8245735Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8245931Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8246150Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8246155Z 
2025-12-04T12:12:57.8246360Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8247277Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8247312Z 
2025-12-04T12:12:57.8247581Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8247794Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8247917Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8248031Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8248362Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8248589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8248683Z graph_break []
2025-12-04T12:12:57.8248891Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8249622Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8249724Z   warnings.warn(
2025-12-04T12:12:57.8249953Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8250060Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8250170Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8250395Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8250723Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8250817Z graph_break []
2025-12-04T12:12:57.8251039Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8251744Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8251853Z   warnings.warn(
2025-12-04T12:12:57.8251998Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8252535Z _ MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8252664Z Traceback (most recent call last):
2025-12-04T12:12:57.8253123Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8253317Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8253534Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8253571Z 
2025-12-04T12:12:57.8253781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8254703Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8254740Z 
2025-12-04T12:12:57.8255001Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8255210Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8255330Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8255440Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8255783Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8256029Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8256124Z graph_break []
2025-12-04T12:12:57.8256349Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8257070Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8257169Z   warnings.warn(
2025-12-04T12:12:57.8257392Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8257534Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8257660Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8257871Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8258199Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8258306Z graph_break []
2025-12-04T12:12:57.8258516Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8259227Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8259342Z   warnings.warn(
2025-12-04T12:12:57.8259551Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8259670Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8259783Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8260002Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8260340Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8260434Z graph_break []
2025-12-04T12:12:57.8260641Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8261360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8261458Z   warnings.warn(
2025-12-04T12:12:57.8262269Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml -
2025-12-04T12:12:57.8262437Z =========================== short test summary info ============================
2025-12-04T12:12:57.8263479Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8263500Z 
2025-12-04T12:12:57.8263710Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8264626Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py MixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8264631Z 
2025-12-04T12:12:57.8264904Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8265116Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8265320Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.8265417Z Got exit code 1
2025-12-04T12:12:57.8266293Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8266710Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8267338Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml
2025-12-04T12:12:57.8267527Z ============================= test session starts ==============================
2025-12-04T12:12:57.8267882Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8267992Z cachedir: .pytest_cache
2025-12-04T12:12:57.8268513Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8268637Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8268778Z configfile: pytest.ini
2025-12-04T12:12:57.8269368Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8269590Z collecting ... collected 380 items / 81 deselected / 299 selected
2025-12-04T12:12:57.8269743Z stepcurrent: skipping 81 already run items.
2025-12-04T12:12:57.8269853Z Running 94 items in this shard
2025-12-04T12:12:57.8269859Z 
2025-12-04T12:12:57.8270860Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.8271859Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.8272842Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [  3%]
2025-12-04T12:12:57.8273836Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [  4%]
2025-12-04T12:12:57.8274821Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [  5%]
2025-12-04T12:12:57.8275822Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [  6%]
2025-12-04T12:12:57.8276804Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [  7%]
2025-12-04T12:12:57.8277798Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [  8%]
2025-12-04T12:12:57.8278343Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims0 PASSED [6.4807s] [  9%]
2025-12-04T12:12:57.8278921Z inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims2 PASSED [1.5991s] [ 10%]
2025-12-04T12:12:57.8279566Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_3layer_split_reduction SKIPPED [0.0034s] (Mix order reduction not enabled) [ 11%]
2025-12-04T12:12:57.8280219Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_XBLOCK_coordest_tuning SKIPPED [0.0028s] (Mix order reduction not enabled) [ 12%]
2025-12-04T12:12:57.8280822Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape0 PASSED [1.1809s] [ 13%]
2025-12-04T12:12:57.8281438Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_no_bias_split_reductions_True_shape1 PASSED [1.1370s] [ 14%]
2025-12-04T12:12:57.8282244Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1744s] [ 15%]
2025-12-04T12:12:57.8283001Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1396s] [ 15%]
2025-12-04T12:12:57.8283648Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1378s] [ 15%]
2025-12-04T12:12:57.8283694Z 
2025-12-04T12:12:57.8283849Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8284232Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8284351Z Traceback (most recent call last):
2025-12-04T12:12:57.8284893Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8285088Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8285314Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8285319Z 
2025-12-04T12:12:57.8285531Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8286297Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8286319Z 
2025-12-04T12:12:57.8286577Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8286793Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8286918Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8287033Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8287251Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8287384Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8287477Z graph_break []
2025-12-04T12:12:57.8287689Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8288422Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8288521Z   warnings.warn(
2025-12-04T12:12:57.8288917Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8289036Z Traceback (most recent call last):
2025-12-04T12:12:57.8289553Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8289759Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8289964Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8289969Z 
2025-12-04T12:12:57.8290191Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8290986Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8290991Z 
2025-12-04T12:12:57.8291251Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8291506Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8291619Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8291734Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8291963Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8292080Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8292190Z graph_break []
2025-12-04T12:12:57.8292432Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8293148Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8293262Z   warnings.warn(
2025-12-04T12:12:57.8293470Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8293577Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8293701Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8293916Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8294078Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8294172Z graph_break []
2025-12-04T12:12:57.8294379Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8295099Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8295197Z   warnings.warn(
2025-12-04T12:12:57.8295336Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8295734Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8295851Z Traceback (most recent call last):
2025-12-04T12:12:57.8296385Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8296580Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8296786Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8296791Z 
2025-12-04T12:12:57.8297008Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8297772Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8297779Z 
2025-12-04T12:12:57.8298050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8298263Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8298374Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8298497Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8298710Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8298827Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8298939Z graph_break []
2025-12-04T12:12:57.8299148Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8299868Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8299966Z   warnings.warn(
2025-12-04T12:12:57.8300175Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8300299Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8300411Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8300626Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8300801Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8301187Z graph_break []
2025-12-04T12:12:57.8301402Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8302223Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8302329Z   warnings.warn(
2025-12-04T12:12:57.8302556Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8302664Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8302777Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8303009Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8303174Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8303270Z graph_break []
2025-12-04T12:12:57.8303490Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8304202Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8304312Z   warnings.warn(
2025-12-04T12:12:57.8305113Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml -
2025-12-04T12:12:57.8305321Z =========================== short test summary info ============================
2025-12-04T12:12:57.8306240Z FAILED [0.1378s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8306247Z 
2025-12-04T12:12:57.8306462Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8307238Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8307246Z 
2025-12-04T12:12:57.8307505Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8307681Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8307929Z ======= 1 failed, 4 passed, 10 skipped, 81 deselected, 2 rerun in 10.95s =======
2025-12-04T12:12:57.8308025Z Got exit code 1
2025-12-04T12:12:57.8308145Z Retrying single test...
2025-12-04T12:12:57.8308774Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml
2025-12-04T12:12:57.8308934Z ============================= test session starts ==============================
2025-12-04T12:12:57.8309288Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8309395Z cachedir: .pytest_cache
2025-12-04T12:12:57.8309900Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8310033Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8310140Z configfile: pytest.ini
2025-12-04T12:12:57.8310729Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8310954Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8311795Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8311920Z Running 1 items in this shard
2025-12-04T12:12:57.8311925Z 
2025-12-04T12:12:57.8312656Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [4.5336s] [100%]
2025-12-04T12:12:57.8313467Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1406s] [100%]
2025-12-04T12:12:57.8314143Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1374s] [100%]
2025-12-04T12:12:57.8314150Z 
2025-12-04T12:12:57.8314304Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8314687Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8314807Z Traceback (most recent call last):
2025-12-04T12:12:57.8315380Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8315575Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8315788Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8315793Z 
2025-12-04T12:12:57.8316013Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8316778Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8316813Z 
2025-12-04T12:12:57.8317088Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8317303Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8317411Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8317535Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8317655Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8317881Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8317977Z graph_break []
2025-12-04T12:12:57.8318188Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8318915Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8319013Z   warnings.warn(
2025-12-04T12:12:57.8319399Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8319533Z Traceback (most recent call last):
2025-12-04T12:12:57.8320055Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8320263Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8320471Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8320476Z 
2025-12-04T12:12:57.8320685Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8321458Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8321463Z 
2025-12-04T12:12:57.8321722Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8321948Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8322059Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8322238Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8322374Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8322590Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8322685Z graph_break []
2025-12-04T12:12:57.8322909Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8323618Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8323774Z   warnings.warn(
2025-12-04T12:12:57.8323984Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8324089Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8324212Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8324456Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8324576Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8324681Z graph_break []
2025-12-04T12:12:57.8324888Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8325597Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8325736Z   warnings.warn(
2025-12-04T12:12:57.8325878Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8326276Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8326398Z Traceback (most recent call last):
2025-12-04T12:12:57.8326920Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8327129Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8327368Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8327373Z 
2025-12-04T12:12:57.8327594Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8328361Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8328366Z 
2025-12-04T12:12:57.8328628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8328853Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8328966Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8329077Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8329209Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8329427Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8329539Z graph_break []
2025-12-04T12:12:57.8329749Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8330461Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8330572Z   warnings.warn(
2025-12-04T12:12:57.8330780Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8330892Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8331017Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8331234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8331365Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8331459Z graph_break []
2025-12-04T12:12:57.8331667Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8332384Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8332492Z   warnings.warn(
2025-12-04T12:12:57.8332708Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8332814Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8332924Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8333151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8333271Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8333363Z graph_break []
2025-12-04T12:12:57.8333586Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8334333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8334445Z   warnings.warn(
2025-12-04T12:12:57.8335280Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml -
2025-12-04T12:12:57.8335449Z =========================== short test summary info ============================
2025-12-04T12:12:57.8336364Z FAILED [0.1374s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8336370Z 
2025-12-04T12:12:57.8336611Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8337381Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8337388Z 
2025-12-04T12:12:57.8337649Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8337824Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8338070Z ================== 1 failed, 174 deselected, 2 rerun in 4.86s ==================
2025-12-04T12:12:57.8338167Z Got exit code 1
2025-12-04T12:12:57.8338272Z Retrying single test...
2025-12-04T12:12:57.8338911Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml
2025-12-04T12:12:57.8339071Z ============================= test session starts ==============================
2025-12-04T12:12:57.8339428Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8339535Z cachedir: .pytest_cache
2025-12-04T12:12:57.8340043Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8340178Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8340286Z configfile: pytest.ini
2025-12-04T12:12:57.8340876Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8341103Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8341942Z stepcurrent: skipping 95 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8342068Z Running 1 items in this shard
2025-12-04T12:12:57.8342073Z 
2025-12-04T12:12:57.8342810Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [4.5108s] [100%]
2025-12-04T12:12:57.8343552Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 ('RERUN', {'yellow': True}) [0.1396s] [100%]
2025-12-04T12:12:57.8344200Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 FAILED [0.1364s] [100%]
2025-12-04T12:12:57.8344208Z 
2025-12-04T12:12:57.8344346Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8344738Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8344855Z Traceback (most recent call last):
2025-12-04T12:12:57.8345383Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8345576Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8345816Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8345821Z 
2025-12-04T12:12:57.8346040Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8346835Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8346844Z 
2025-12-04T12:12:57.8347115Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8347330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8347438Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8347557Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8347703Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8347921Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8348027Z graph_break []
2025-12-04T12:12:57.8348238Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8348964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8349062Z   warnings.warn(
2025-12-04T12:12:57.8349490Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8349620Z Traceback (most recent call last):
2025-12-04T12:12:57.8350135Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8350328Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8350548Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8350554Z 
2025-12-04T12:12:57.8350762Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8351540Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8351545Z 
2025-12-04T12:12:57.8351800Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8352016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8352138Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8352249Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8352382Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8352600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8352695Z graph_break []
2025-12-04T12:12:57.8352916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8353623Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8353725Z   warnings.warn(
2025-12-04T12:12:57.8353945Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8354052Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8354176Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8354391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8354510Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8354615Z graph_break []
2025-12-04T12:12:57.8354822Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8355531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8355639Z   warnings.warn(
2025-12-04T12:12:57.8355776Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8356170Z _ NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 _
2025-12-04T12:12:57.8356399Z Traceback (most recent call last):
2025-12-04T12:12:57.8356916Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 392, in test_layer_norm_bwd_with_bias
2025-12-04T12:12:57.8357149Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8357358Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8357363Z 
2025-12-04T12:12:57.8357570Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8358347Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8358351Z 
2025-12-04T12:12:57.8358638Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8358861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8358972Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8359082Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8359210Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8359424Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8359561Z graph_break []
2025-12-04T12:12:57.8359771Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8360483Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8360593Z   warnings.warn(
2025-12-04T12:12:57.8360798Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8360908Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8361031Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8361246Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8361365Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8361471Z graph_break []
2025-12-04T12:12:57.8361677Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8362479Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8362580Z   warnings.warn(
2025-12-04T12:12:57.8362787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8362910Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8363021Z stats [('calls_captured', 3)]
2025-12-04T12:12:57.8363236Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8363369Z inductor [('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8363463Z graph_break []
2025-12-04T12:12:57.8363685Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8364393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8364491Z   warnings.warn(
2025-12-04T12:12:57.8365304Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml -
2025-12-04T12:12:57.8365471Z =========================== short test summary info ============================
2025-12-04T12:12:57.8366384Z FAILED [0.1364s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1 - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8366391Z 
2025-12-04T12:12:57.8366601Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8367359Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8367404Z 
2025-12-04T12:12:57.8367674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8367847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8368084Z ================== 1 failed, 174 deselected, 2 rerun in 4.84s ==================
2025-12-04T12:12:57.8368180Z Got exit code 1
2025-12-04T12:12:57.8368862Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1
2025-12-04T12:12:57.8369275Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8369928Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml
2025-12-04T12:12:57.8370103Z ============================= test session starts ==============================
2025-12-04T12:12:57.8370445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8370550Z cachedir: .pytest_cache
2025-12-04T12:12:57.8371073Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8371224Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8371331Z configfile: pytest.ini
2025-12-04T12:12:57.8371917Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8372138Z collecting ... collected 380 items / 96 deselected / 284 selected
2025-12-04T12:12:57.8372294Z stepcurrent: skipping 96 already run items.
2025-12-04T12:12:57.8372404Z Running 79 items in this shard
2025-12-04T12:12:57.8372409Z 
2025-12-04T12:12:57.8373056Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_False_shape0 PASSED [5.4991s] [  1%]
2025-12-04T12:12:57.8373705Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_float32_split_reductions_True_shape0 PASSED [1.0878s] [  2%]
2025-12-04T12:12:57.8374436Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims0 SKIPPED [0.0032s] (Mix order reduction not enabled) [  3%]
2025-12-04T12:12:57.8375175Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_dynamic_shape_dynamic_dims2 SKIPPED [0.0028s] (Mix order reduction not enabled) [  5%]
2025-12-04T12:12:57.8375852Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_False_shape0 PASSED [0.3674s] [  6%]
2025-12-04T12:12:57.8376521Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_False_split_reductions_True_shape0 PASSED [0.4677s] [  7%]
2025-12-04T12:12:57.8377201Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_mean_swap_True_split_reductions_True_shape1 PASSED [0.4821s] [  8%]
2025-12-04T12:12:57.8377884Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_False_shape1 PASSED [0.5381s] [ 10%]
2025-12-04T12:12:57.8378568Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_False_split_reductions_True_shape1 PASSED [0.5491s] [ 11%]
2025-12-04T12:12:57.8379236Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape0 PASSED [0.5275s] [ 12%]
2025-12-04T12:12:57.8380034Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_False_shape2 SKIPPED [0.0031s] (Invalid combination) [ 13%]
2025-12-04T12:12:57.8380745Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_prod_swap_True_split_reductions_True_shape0 PASSED [0.5533s] [ 15%]
2025-12-04T12:12:57.8381441Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_False_shape0 PASSED [0.5751s] [ 16%]
2025-12-04T12:12:57.8382123Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_False_split_reductions_True_shape2 PASSED [0.7191s] [ 17%]
2025-12-04T12:12:57.8382795Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape1 PASSED [0.5642s] [ 18%]
2025-12-04T12:12:57.8383495Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_False_shape2 PASSED [0.2829s] [ 20%]
2025-12-04T12:12:57.8384155Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape0 PASSED [0.2914s] [ 21%]
2025-12-04T12:12:57.8384824Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_mix_order_reduction_name_sum_swap_True_split_reductions_True_shape1 PASSED [0.2882s] [ 22%]
2025-12-04T12:12:57.8385335Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_multi_workspace_allocation PASSED [0.6593s] [ 24%]
2025-12-04T12:12:57.8385792Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_non_contiguous_input PASSED [0.8390s] [ 25%]
2025-12-04T12:12:57.8386717Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.2270s] [ 26%]
2025-12-04T12:12:57.8387617Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [ 26%]
2025-12-04T12:12:57.8388459Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1569s] [ 26%]
2025-12-04T12:12:57.8388469Z 
2025-12-04T12:12:57.8388609Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8389174Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8389308Z Traceback (most recent call last):
2025-12-04T12:12:57.8389773Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8389978Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8390185Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8390192Z 
2025-12-04T12:12:57.8390400Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8391360Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8391368Z 
2025-12-04T12:12:57.8391628Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8391854Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8391963Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8392074Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8392304Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8392636Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8392741Z graph_break []
2025-12-04T12:12:57.8392986Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8393704Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8393813Z   warnings.warn(
2025-12-04T12:12:57.8394407Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8394527Z Traceback (most recent call last):
2025-12-04T12:12:57.8395001Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8395194Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8395452Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8395458Z 
2025-12-04T12:12:57.8395666Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8396614Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8396619Z 
2025-12-04T12:12:57.8396892Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8397135Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8397256Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8397369Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8397587Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8397932Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8398029Z graph_break []
2025-12-04T12:12:57.8398239Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8398966Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8399065Z   warnings.warn(
2025-12-04T12:12:57.8399287Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8399397Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8399509Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8399738Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8400066Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8400162Z graph_break []
2025-12-04T12:12:57.8400386Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8401449Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8401570Z   warnings.warn(
2025-12-04T12:12:57.8401711Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8402338Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8402481Z Traceback (most recent call last):
2025-12-04T12:12:57.8402944Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8403137Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8403355Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8403361Z 
2025-12-04T12:12:57.8403573Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8404534Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8404640Z 
2025-12-04T12:12:57.8404903Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8405129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8405239Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8405394Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8405624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8405951Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8406049Z graph_break []
2025-12-04T12:12:57.8406272Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8407029Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8407145Z   warnings.warn(
2025-12-04T12:12:57.8407352Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8407459Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8407586Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8407800Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8408173Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8408278Z graph_break []
2025-12-04T12:12:57.8408487Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8409191Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8409302Z   warnings.warn(
2025-12-04T12:12:57.8409515Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8409635Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8409744Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8409957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8410294Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8410389Z graph_break []
2025-12-04T12:12:57.8410601Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8411317Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8411412Z   warnings.warn(
2025-12-04T12:12:57.8412227Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml -
2025-12-04T12:12:57.8412394Z =========================== short test summary info ============================
2025-12-04T12:12:57.8413473Z FAILED [0.1569s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8413490Z 
2025-12-04T12:12:57.8413701Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8414648Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8414653Z 
2025-12-04T12:12:57.8414925Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8415102Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8415347Z ======= 1 failed, 17 passed, 3 skipped, 96 deselected, 2 rerun in 14.93s =======
2025-12-04T12:12:57.8415489Z Got exit code 1
2025-12-04T12:12:57.8415594Z Retrying single test...
2025-12-04T12:12:57.8416225Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml
2025-12-04T12:12:57.8416384Z ============================= test session starts ==============================
2025-12-04T12:12:57.8416756Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8416875Z cachedir: .pytest_cache
2025-12-04T12:12:57.8417385Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8417520Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8417625Z configfile: pytest.ini
2025-12-04T12:12:57.8418231Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8418464Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8419491Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8419646Z Running 1 items in this shard
2025-12-04T12:12:57.8419651Z 
2025-12-04T12:12:57.8420554Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5621s] [100%]
2025-12-04T12:12:57.8421454Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1646s] [100%]
2025-12-04T12:12:57.8422285Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1585s] [100%]
2025-12-04T12:12:57.8422293Z 
2025-12-04T12:12:57.8422430Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8423005Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8423126Z Traceback (most recent call last):
2025-12-04T12:12:57.8423587Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8423790Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8423994Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8423999Z 
2025-12-04T12:12:57.8424221Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8425162Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8425171Z 
2025-12-04T12:12:57.8425431Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8425657Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8425768Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8425892Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8426223Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8426439Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8426548Z graph_break []
2025-12-04T12:12:57.8426759Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8427478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8427629Z   warnings.warn(
2025-12-04T12:12:57.8428192Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8428322Z Traceback (most recent call last):
2025-12-04T12:12:57.8428816Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8429010Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8429224Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8429229Z 
2025-12-04T12:12:57.8429439Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8430426Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8430434Z 
2025-12-04T12:12:57.8430694Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8430904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8431024Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8431171Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8431520Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8431736Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8431832Z graph_break []
2025-12-04T12:12:57.8432051Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8432770Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8432869Z   warnings.warn(
2025-12-04T12:12:57.8433090Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8433198Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8433321Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8433532Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8433862Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8433968Z graph_break []
2025-12-04T12:12:57.8434174Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8434883Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8434992Z   warnings.warn(
2025-12-04T12:12:57.8435136Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8435712Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8435833Z Traceback (most recent call last):
2025-12-04T12:12:57.8436291Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8436501Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8436708Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8436713Z 
2025-12-04T12:12:57.8436920Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8437875Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8437881Z 
2025-12-04T12:12:57.8438142Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8438398Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8438509Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8438620Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8438965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8439231Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8439342Z graph_break []
2025-12-04T12:12:57.8439549Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8440266Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8440376Z   warnings.warn(
2025-12-04T12:12:57.8440613Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8440726Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8440852Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8441065Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8441404Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8441500Z graph_break []
2025-12-04T12:12:57.8441710Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8442550Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8442651Z   warnings.warn(
2025-12-04T12:12:57.8442863Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8442985Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8443097Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8443325Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8443653Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8443748Z graph_break []
2025-12-04T12:12:57.8443970Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8444677Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8444776Z   warnings.warn(
2025-12-04T12:12:57.8445583Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml -
2025-12-04T12:12:57.8445752Z =========================== short test summary info ============================
2025-12-04T12:12:57.8446837Z FAILED [0.1585s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8446846Z 
2025-12-04T12:12:57.8447056Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8448011Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8448018Z 
2025-12-04T12:12:57.8448277Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8448454Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8448659Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8448755Z Got exit code 1
2025-12-04T12:12:57.8448864Z Retrying single test...
2025-12-04T12:12:57.8449500Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml
2025-12-04T12:12:57.8449699Z ============================= test session starts ==============================
2025-12-04T12:12:57.8450050Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8450156Z cachedir: .pytest_cache
2025-12-04T12:12:57.8450692Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8450825Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8450930Z configfile: pytest.ini
2025-12-04T12:12:57.8451505Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8451768Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8452800Z stepcurrent: skipping 116 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8452923Z Running 1 items in this shard
2025-12-04T12:12:57.8452928Z 
2025-12-04T12:12:57.8453833Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5950s] [100%]
2025-12-04T12:12:57.8454788Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1680s] [100%]
2025-12-04T12:12:57.8455613Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1645s] [100%]
2025-12-04T12:12:57.8455618Z 
2025-12-04T12:12:57.8455756Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8456349Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8456469Z Traceback (most recent call last):
2025-12-04T12:12:57.8456947Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8457144Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8457353Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8457358Z 
2025-12-04T12:12:57.8457580Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8458522Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8458527Z 
2025-12-04T12:12:57.8458802Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8459016Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8459126Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8459253Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8459590Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8459820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8459918Z graph_break []
2025-12-04T12:12:57.8460128Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8460866Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8460966Z   warnings.warn(
2025-12-04T12:12:57.8461530Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8461697Z Traceback (most recent call last):
2025-12-04T12:12:57.8462156Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8462366Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8462605Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8462611Z 
2025-12-04T12:12:57.8462819Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8463773Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8463778Z 
2025-12-04T12:12:57.8464069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8464297Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8464410Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8464522Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8464867Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8465083Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8465210Z graph_break []
2025-12-04T12:12:57.8465432Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8466145Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8466260Z   warnings.warn(
2025-12-04T12:12:57.8466471Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8466578Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8466702Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8466918Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8467246Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8467355Z graph_break []
2025-12-04T12:12:57.8467566Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8468293Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8468390Z   warnings.warn(
2025-12-04T12:12:57.8468528Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8469104Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8469221Z Traceback (most recent call last):
2025-12-04T12:12:57.8469692Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8469887Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8470090Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8470095Z 
2025-12-04T12:12:57.8470315Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8471253Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8471258Z 
2025-12-04T12:12:57.8471529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8471742Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8471851Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8471976Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8472342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8472553Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8472661Z graph_break []
2025-12-04T12:12:57.8472872Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8473635Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8473734Z   warnings.warn(
2025-12-04T12:12:57.8473947Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8474070Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8474180Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8474421Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8474766Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8474864Z graph_break []
2025-12-04T12:12:57.8475074Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8475803Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8475932Z   warnings.warn(
2025-12-04T12:12:57.8476150Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8476260Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8476372Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8476597Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8476925Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8477021Z graph_break []
2025-12-04T12:12:57.8477242Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8477950Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8478058Z   warnings.warn(
2025-12-04T12:12:57.8478859Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml -
2025-12-04T12:12:57.8479027Z =========================== short test summary info ============================
2025-12-04T12:12:57.8480110Z FAILED [0.1645s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8480117Z 
2025-12-04T12:12:57.8480328Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8481281Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8481288Z 
2025-12-04T12:12:57.8481550Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8481742Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8481936Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ==================
2025-12-04T12:12:57.8482032Z Got exit code 1
2025-12-04T12:12:57.8482973Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8483380Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8484017Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml
2025-12-04T12:12:57.8484235Z ============================= test session starts ==============================
2025-12-04T12:12:57.8484577Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8484734Z cachedir: .pytest_cache
2025-12-04T12:12:57.8485244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8485365Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8485487Z configfile: pytest.ini
2025-12-04T12:12:57.8486064Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8486338Z collecting ... collected 380 items / 117 deselected / 263 selected
2025-12-04T12:12:57.8486484Z stepcurrent: skipping 117 already run items.
2025-12-04T12:12:57.8486596Z Running 58 items in this shard
2025-12-04T12:12:57.8486601Z 
2025-12-04T12:12:57.8487632Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  1%]
2025-12-04T12:12:57.8488677Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  3%]
2025-12-04T12:12:57.8489701Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [  5%]
2025-12-04T12:12:57.8490704Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [  6%]
2025-12-04T12:12:57.8491728Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [  8%]
2025-12-04T12:12:57.8492732Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 10%]
2025-12-04T12:12:57.8493752Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 12%]
2025-12-04T12:12:57.8494756Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 13%]
2025-12-04T12:12:57.8495763Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 15%]
2025-12-04T12:12:57.8496785Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 17%]
2025-12-04T12:12:57.8497794Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0027s] (Skip non-critical tests to save resources.) [ 18%]
2025-12-04T12:12:57.8498847Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 20%]
2025-12-04T12:12:57.8499777Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5527s] [ 22%]
2025-12-04T12:12:57.8500699Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1638s] [ 22%]
2025-12-04T12:12:57.8501877Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [ 22%]
2025-12-04T12:12:57.8501886Z 
2025-12-04T12:12:57.8502042Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8502609Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8502730Z Traceback (most recent call last):
2025-12-04T12:12:57.8503210Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8503452Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8503675Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8503680Z 
2025-12-04T12:12:57.8503890Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8504833Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8504841Z 
2025-12-04T12:12:57.8505114Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8505327Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8505449Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8505561Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8505897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8506127Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8506224Z graph_break []
2025-12-04T12:12:57.8506435Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8507167Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8507267Z   warnings.warn(
2025-12-04T12:12:57.8507841Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8507961Z Traceback (most recent call last):
2025-12-04T12:12:57.8508419Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8508627Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8508835Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8508840Z 
2025-12-04T12:12:57.8509047Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8510001Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8510007Z 
2025-12-04T12:12:57.8510264Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8510540Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8510651Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8510763Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8511104Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8511357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8511468Z graph_break []
2025-12-04T12:12:57.8511678Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8512395Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8512506Z   warnings.warn(
2025-12-04T12:12:57.8512743Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8512851Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8512979Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8513189Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8513530Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8513626Z graph_break []
2025-12-04T12:12:57.8513838Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8514593Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8514692Z   warnings.warn(
2025-12-04T12:12:57.8514832Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8515414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8515534Z Traceback (most recent call last):
2025-12-04T12:12:57.8516009Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8516202Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8516406Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8516411Z 
2025-12-04T12:12:57.8516634Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8517576Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8517581Z 
2025-12-04T12:12:57.8517850Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8518062Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8518170Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8518293Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8518651Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8518911Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8519021Z graph_break []
2025-12-04T12:12:57.8519237Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8519971Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8520068Z   warnings.warn(
2025-12-04T12:12:57.8520275Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8520396Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8520509Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8520723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8521063Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8521200Z graph_break []
2025-12-04T12:12:57.8521425Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8522241Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8522346Z   warnings.warn(
2025-12-04T12:12:57.8522571Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8522678Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8522791Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8523019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8523376Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8523490Z graph_break []
2025-12-04T12:12:57.8523702Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8524409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8524522Z   warnings.warn(
2025-12-04T12:12:57.8525322Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml -
2025-12-04T12:12:57.8525552Z =========================== short test summary info ============================
2025-12-04T12:12:57.8526627Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8526634Z 
2025-12-04T12:12:57.8526846Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8527810Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8527816Z 
2025-12-04T12:12:57.8528077Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8528271Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8528485Z ============ 1 failed, 12 skipped, 117 deselected, 2 rerun in 4.98s ============
2025-12-04T12:12:57.8528583Z Got exit code 1
2025-12-04T12:12:57.8528703Z Retrying single test...
2025-12-04T12:12:57.8529334Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml
2025-12-04T12:12:57.8529508Z ============================= test session starts ==============================
2025-12-04T12:12:57.8529853Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8529965Z cachedir: .pytest_cache
2025-12-04T12:12:57.8530491Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8530614Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8530726Z configfile: pytest.ini
2025-12-04T12:12:57.8531321Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8531545Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8532583Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8532694Z Running 1 items in this shard
2025-12-04T12:12:57.8532740Z 
2025-12-04T12:12:57.8533649Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6158s] [100%]
2025-12-04T12:12:57.8534600Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1657s] [100%]
2025-12-04T12:12:57.8535429Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1615s] [100%]
2025-12-04T12:12:57.8535435Z 
2025-12-04T12:12:57.8535616Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8536183Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8536320Z Traceback (most recent call last):
2025-12-04T12:12:57.8536781Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8536977Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8537209Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8537326Z 
2025-12-04T12:12:57.8537536Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8538495Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8538500Z 
2025-12-04T12:12:57.8538760Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8538976Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8539102Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8539217Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8539546Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8539774Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8539873Z graph_break []
2025-12-04T12:12:57.8540103Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8540821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8540920Z   warnings.warn(
2025-12-04T12:12:57.8541501Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8541619Z Traceback (most recent call last):
2025-12-04T12:12:57.8542087Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8542285Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8542493Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8542498Z 
2025-12-04T12:12:57.8542728Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8543677Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8543681Z 
2025-12-04T12:12:57.8543956Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8544169Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8544277Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8544402Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8544774Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8544990Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8545100Z graph_break []
2025-12-04T12:12:57.8545310Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8546074Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8546175Z   warnings.warn(
2025-12-04T12:12:57.8546384Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8546507Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8546617Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8546858Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8547202Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8547299Z graph_break []
2025-12-04T12:12:57.8547510Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8548231Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8548358Z   warnings.warn(
2025-12-04T12:12:57.8548510Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8549078Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8549195Z Traceback (most recent call last):
2025-12-04T12:12:57.8549674Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8549869Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8550090Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8550095Z 
2025-12-04T12:12:57.8550304Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8551250Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8551256Z 
2025-12-04T12:12:57.8551529Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8551741Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8551865Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8551980Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8552310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8552538Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8552636Z graph_break []
2025-12-04T12:12:57.8552847Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8553577Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8553681Z   warnings.warn(
2025-12-04T12:12:57.8553904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8554014Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8554127Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8554355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8554687Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8554783Z graph_break []
2025-12-04T12:12:57.8555004Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8555751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8555864Z   warnings.warn(
2025-12-04T12:12:57.8556073Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8556214Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8556337Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8556551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8556879Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8556988Z graph_break []
2025-12-04T12:12:57.8557195Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8557948Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8558048Z   warnings.warn(
2025-12-04T12:12:57.8558844Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml -
2025-12-04T12:12:57.8559026Z =========================== short test summary info ============================
2025-12-04T12:12:57.8560127Z FAILED [0.1615s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8560133Z 
2025-12-04T12:12:57.8560359Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8561297Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8561304Z 
2025-12-04T12:12:57.8561563Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8561754Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8561950Z ================== 1 failed, 174 deselected, 2 rerun in 5.00s ==================
2025-12-04T12:12:57.8562064Z Got exit code 1
2025-12-04T12:12:57.8562248Z Retrying single test...
2025-12-04T12:12:57.8562883Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml
2025-12-04T12:12:57.8563058Z ============================= test session starts ==============================
2025-12-04T12:12:57.8563402Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8563509Z cachedir: .pytest_cache
2025-12-04T12:12:57.8564031Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8564153Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8564273Z configfile: pytest.ini
2025-12-04T12:12:57.8564847Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8565073Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8566111Z stepcurrent: skipping 129 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8566224Z Running 1 items in this shard
2025-12-04T12:12:57.8566231Z 
2025-12-04T12:12:57.8567147Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5680s] [100%]
2025-12-04T12:12:57.8568098Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%]
2025-12-04T12:12:57.8568966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1591s] [100%]
2025-12-04T12:12:57.8568974Z 
2025-12-04T12:12:57.8569112Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8569676Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8569851Z Traceback (most recent call last):
2025-12-04T12:12:57.8570311Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8570519Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8570723Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8570727Z 
2025-12-04T12:12:57.8570936Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8571887Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8571924Z 
2025-12-04T12:12:57.8572184Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8572410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8572519Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8572632Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8572973Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8573190Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8573285Z graph_break []
2025-12-04T12:12:57.8573510Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8574234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8574350Z   warnings.warn(
2025-12-04T12:12:57.8574911Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8575030Z Traceback (most recent call last):
2025-12-04T12:12:57.8575505Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8575697Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8575902Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8575923Z 
2025-12-04T12:12:57.8576130Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8577076Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8577083Z 
2025-12-04T12:12:57.8577356Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8577568Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8577678Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8577800Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8578132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8578357Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8578488Z graph_break []
2025-12-04T12:12:57.8578697Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8579430Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8579530Z   warnings.warn(
2025-12-04T12:12:57.8579775Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8579896Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8580009Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8580234Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8580568Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8580693Z graph_break []
2025-12-04T12:12:57.8580917Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8581652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8581754Z   warnings.warn(
2025-12-04T12:12:57.8581906Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8582470Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8582638Z Traceback (most recent call last):
2025-12-04T12:12:57.8583094Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8583289Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8583512Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8583517Z 
2025-12-04T12:12:57.8583726Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8584680Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8584685Z 
2025-12-04T12:12:57.8584945Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8585160Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8585285Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8585397Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8585742Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8585957Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8586053Z graph_break []
2025-12-04T12:12:57.8586280Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8586995Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8587096Z   warnings.warn(
2025-12-04T12:12:57.8587321Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8587430Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8587563Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8587779Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8588108Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8588218Z graph_break []
2025-12-04T12:12:57.8588428Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8589141Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8589295Z   warnings.warn(
2025-12-04T12:12:57.8589505Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8589629Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8589739Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8589952Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8590327Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8590425Z graph_break []
2025-12-04T12:12:57.8590632Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8591354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8591478Z   warnings.warn(
2025-12-04T12:12:57.8592288Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml -
2025-12-04T12:12:57.8592462Z =========================== short test summary info ============================
2025-12-04T12:12:57.8593530Z FAILED [0.1591s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8593565Z 
2025-12-04T12:12:57.8593791Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8594730Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8594735Z 
2025-12-04T12:12:57.8595008Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8595185Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8595383Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8595489Z Got exit code 1
2025-12-04T12:12:57.8596346Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8596759Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8597381Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml
2025-12-04T12:12:57.8597541Z ============================= test session starts ==============================
2025-12-04T12:12:57.8597899Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8598004Z cachedir: .pytest_cache
2025-12-04T12:12:57.8598528Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8598646Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8598751Z configfile: pytest.ini
2025-12-04T12:12:57.8599339Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8599566Z collecting ... collected 380 items / 130 deselected / 250 selected
2025-12-04T12:12:57.8599710Z stepcurrent: skipping 130 already run items.
2025-12-04T12:12:57.8599833Z Running 45 items in this shard
2025-12-04T12:12:57.8599838Z 
2025-12-04T12:12:57.8601080Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.8602171Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0028s] (Skip non-critical tests to save resources.) [  4%]
2025-12-04T12:12:57.8603333Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0034s] (Skip non-critical tests to save resources.) [  6%]
2025-12-04T12:12:57.8604359Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [  8%]
2025-12-04T12:12:57.8605303Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5510s] [ 11%]
2025-12-04T12:12:57.8606214Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1624s] [ 11%]
2025-12-04T12:12:57.8607031Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1615s] [ 11%]
2025-12-04T12:12:57.8607083Z 
2025-12-04T12:12:57.8607227Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8607800Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8607919Z Traceback (most recent call last):
2025-12-04T12:12:57.8608398Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8608593Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8608803Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8608808Z 
2025-12-04T12:12:57.8609034Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8609968Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8609975Z 
2025-12-04T12:12:57.8610249Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8610465Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8610576Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8610701Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8611034Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8611245Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8611359Z graph_break []
2025-12-04T12:12:57.8611569Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8612302Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8612401Z   warnings.warn(
2025-12-04T12:12:57.8612957Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8613087Z Traceback (most recent call last):
2025-12-04T12:12:57.8613545Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8613751Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8613958Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8614012Z 
2025-12-04T12:12:57.8614224Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8615164Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8615171Z 
2025-12-04T12:12:57.8615482Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8615707Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8615814Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8615927Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8616274Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8616519Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8616619Z graph_break []
2025-12-04T12:12:57.8616843Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8617564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8617679Z   warnings.warn(
2025-12-04T12:12:57.8617892Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8618033Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8618157Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8618373Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8618701Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8618809Z graph_break []
2025-12-04T12:12:57.8619023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8619743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8619842Z   warnings.warn(
2025-12-04T12:12:57.8619981Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8620550Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8620669Z Traceback (most recent call last):
2025-12-04T12:12:57.8621125Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8621332Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8621537Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8621542Z 
2025-12-04T12:12:57.8621764Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8622695Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8622702Z 
2025-12-04T12:12:57.8622961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8623185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8623294Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8623418Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8623747Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8623960Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8624067Z graph_break []
2025-12-04T12:12:57.8624276Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8624989Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8625133Z   warnings.warn(
2025-12-04T12:12:57.8625342Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8625463Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8625575Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8625821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8626163Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8626258Z graph_break []
2025-12-04T12:12:57.8626465Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8627215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8627313Z   warnings.warn(
2025-12-04T12:12:57.8627530Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8627639Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8627749Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8627972Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8628301Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8628427Z graph_break []
2025-12-04T12:12:57.8628645Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8629350Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8629458Z   warnings.warn(
2025-12-04T12:12:57.8630254Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml -
2025-12-04T12:12:57.8630421Z =========================== short test summary info ============================
2025-12-04T12:12:57.8631491Z FAILED [0.1615s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8631501Z 
2025-12-04T12:12:57.8631711Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8632656Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8632661Z 
2025-12-04T12:12:57.8632922Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8633095Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8633321Z ============ 1 failed, 4 skipped, 130 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:57.8633420Z Got exit code 1
2025-12-04T12:12:57.8633537Z Retrying single test...
2025-12-04T12:12:57.8634161Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml
2025-12-04T12:12:57.8634326Z ============================= test session starts ==============================
2025-12-04T12:12:57.8634680Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8634787Z cachedir: .pytest_cache
2025-12-04T12:12:57.8635298Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8635437Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8635546Z configfile: pytest.ini
2025-12-04T12:12:57.8636139Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8636399Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8637440Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8637567Z Running 1 items in this shard
2025-12-04T12:12:57.8637572Z 
2025-12-04T12:12:57.8638477Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5556s] [100%]
2025-12-04T12:12:57.8639415Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1615s] [100%]
2025-12-04T12:12:57.8640233Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1590s] [100%]
2025-12-04T12:12:57.8640239Z 
2025-12-04T12:12:57.8640386Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8640977Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8641095Z Traceback (most recent call last):
2025-12-04T12:12:57.8641567Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8641760Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8641981Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8641985Z 
2025-12-04T12:12:57.8642277Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8643219Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8643224Z 
2025-12-04T12:12:57.8643497Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8643713Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8643836Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8643948Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8644279Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8644509Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8644607Z graph_break []
2025-12-04T12:12:57.8644819Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8645554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8645658Z   warnings.warn(
2025-12-04T12:12:57.8646229Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8646352Z Traceback (most recent call last):
2025-12-04T12:12:57.8646812Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8647020Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8647226Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8647231Z 
2025-12-04T12:12:57.8647445Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8648398Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8648447Z 
2025-12-04T12:12:57.8648708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8648934Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8649076Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8649190Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8649537Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8649754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8649865Z graph_break []
2025-12-04T12:12:57.8650077Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8650828Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8650946Z   warnings.warn(
2025-12-04T12:12:57.8651157Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8651265Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8651392Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8651610Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8651988Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8652087Z graph_break []
2025-12-04T12:12:57.8652298Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8653026Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8653125Z   warnings.warn(
2025-12-04T12:12:57.8653265Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8653839Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8653955Z Traceback (most recent call last):
2025-12-04T12:12:57.8654429Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8654625Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8654828Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8654832Z 
2025-12-04T12:12:57.8655055Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8655988Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8655993Z 
2025-12-04T12:12:57.8656263Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8656474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8656584Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8656707Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8657041Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8657268Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8657363Z graph_break []
2025-12-04T12:12:57.8657569Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8658295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8658394Z   warnings.warn(
2025-12-04T12:12:57.8658602Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8658776Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8658887Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8659101Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8659440Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8659536Z graph_break []
2025-12-04T12:12:57.8659788Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8660505Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8660603Z   warnings.warn(
2025-12-04T12:12:57.8660823Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8660963Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8661077Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8661306Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8661640Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8661747Z graph_break []
2025-12-04T12:12:57.8661959Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8662669Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8662813Z   warnings.warn(
2025-12-04T12:12:57.8663610Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml -
2025-12-04T12:12:57.8663794Z =========================== short test summary info ============================
2025-12-04T12:12:57.8664857Z FAILED [0.1590s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8664864Z 
2025-12-04T12:12:57.8665075Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8666025Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8666032Z 
2025-12-04T12:12:57.8666292Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8666485Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8666678Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.8666777Z Got exit code 1
2025-12-04T12:12:57.8666896Z Retrying single test...
2025-12-04T12:12:57.8667531Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml
2025-12-04T12:12:57.8667703Z ============================= test session starts ==============================
2025-12-04T12:12:57.8668041Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8668152Z cachedir: .pytest_cache
2025-12-04T12:12:57.8668667Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8668786Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8668889Z configfile: pytest.ini
2025-12-04T12:12:57.8669476Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8669700Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8670738Z stepcurrent: skipping 134 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8670890Z Running 1 items in this shard
2025-12-04T12:12:57.8670896Z 
2025-12-04T12:12:57.8671829Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5760s] [100%]
2025-12-04T12:12:57.8672742Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1640s] [100%]
2025-12-04T12:12:57.8673600Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1588s] [100%]
2025-12-04T12:12:57.8673605Z 
2025-12-04T12:12:57.8673762Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8674319Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8674449Z Traceback (most recent call last):
2025-12-04T12:12:57.8674912Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8675139Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8675359Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8675364Z 
2025-12-04T12:12:57.8675572Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8676519Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8676526Z 
2025-12-04T12:12:57.8676784Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8676993Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8677114Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8677229Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8677564Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8677788Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8677882Z graph_break []
2025-12-04T12:12:57.8678109Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8678826Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8678924Z   warnings.warn(
2025-12-04T12:12:57.8679490Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8679614Z Traceback (most recent call last):
2025-12-04T12:12:57.8680083Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8680280Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8680486Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8680491Z 
2025-12-04T12:12:57.8680711Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8681639Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8681644Z 
2025-12-04T12:12:57.8681917Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8682232Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8682345Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8682473Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8682801Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8683054Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8683166Z graph_break []
2025-12-04T12:12:57.8683377Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8684106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8684206Z   warnings.warn(
2025-12-04T12:12:57.8684445Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8684569Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8684684Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8684896Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8685235Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8685331Z graph_break []
2025-12-04T12:12:57.8685553Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8686340Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8686438Z   warnings.warn(
2025-12-04T12:12:57.8686589Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8687150Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.8687268Z Traceback (most recent call last):
2025-12-04T12:12:57.8687742Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8687934Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8688156Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8688161Z 
2025-12-04T12:12:57.8688373Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8689307Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8689326Z 
2025-12-04T12:12:57.8689583Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8689795Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8689918Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8690030Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8690357Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8690584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8690680Z graph_break []
2025-12-04T12:12:57.8690892Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8691620Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8691717Z   warnings.warn(
2025-12-04T12:12:57.8691940Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8692046Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8692157Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8692383Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8692710Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8692840Z graph_break []
2025-12-04T12:12:57.8693061Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8693806Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8693921Z   warnings.warn(
2025-12-04T12:12:57.8694129Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8694236Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8694359Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8694572Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8694928Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8695037Z graph_break []
2025-12-04T12:12:57.8695244Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8695964Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8696065Z   warnings.warn(
2025-12-04T12:12:57.8696869Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml -
2025-12-04T12:12:57.8697098Z =========================== short test summary info ============================
2025-12-04T12:12:57.8698162Z FAILED [0.1588s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8698168Z 
2025-12-04T12:12:57.8698388Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8699322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8699328Z 
2025-12-04T12:12:57.8699586Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8699776Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8699971Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.8700082Z Got exit code 1
2025-12-04T12:12:57.8701134Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.8701538Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8702185Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml
2025-12-04T12:12:57.8702345Z ============================= test session starts ==============================
2025-12-04T12:12:57.8702717Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8702827Z cachedir: .pytest_cache
2025-12-04T12:12:57.8703336Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8703473Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8703580Z configfile: pytest.ini
2025-12-04T12:12:57.8704159Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8704396Z collecting ... collected 380 items / 135 deselected / 245 selected
2025-12-04T12:12:57.8704630Z stepcurrent: skipping 135 already run items.
2025-12-04T12:12:57.8704760Z Running 40 items in this shard
2025-12-04T12:12:57.8704765Z 
2025-12-04T12:12:57.8705702Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5947s] [  2%]
2025-12-04T12:12:57.8706617Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1689s] [  2%]
2025-12-04T12:12:57.8707431Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1658s] [  2%]
2025-12-04T12:12:57.8707476Z 
2025-12-04T12:12:57.8707615Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8708188Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8708309Z Traceback (most recent call last):
2025-12-04T12:12:57.8708786Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8709026Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8709232Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8709237Z 
2025-12-04T12:12:57.8709463Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8710399Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8710404Z 
2025-12-04T12:12:57.8710680Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8710896Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8711008Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8711135Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8711467Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8711686Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8711799Z graph_break []
2025-12-04T12:12:57.8712012Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8714688Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8714795Z   return x.grad, w.grad
2025-12-04T12:12:57.8715524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8715623Z   warnings.warn(
2025-12-04T12:12:57.8718267Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8718426Z   return x.grad, w.grad
2025-12-04T12:12:57.8718979Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8719146Z Traceback (most recent call last):
2025-12-04T12:12:57.8719609Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8719802Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8720023Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8720028Z 
2025-12-04T12:12:57.8720241Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8721293Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8721301Z 
2025-12-04T12:12:57.8721561Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8721787Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8721899Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8722048Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8722458Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8722673Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8722768Z graph_break []
2025-12-04T12:12:57.8722992Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8725646Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8725767Z   return x.grad, w.grad
2025-12-04T12:12:57.8726478Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8726574Z   warnings.warn(
2025-12-04T12:12:57.8729228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8729332Z   return x.grad, w.grad
2025-12-04T12:12:57.8729565Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8729674Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8729801Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8730014Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8730345Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8730459Z graph_break []
2025-12-04T12:12:57.8730668Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8733352Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8733486Z   return x.grad, w.grad
2025-12-04T12:12:57.8734199Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8734337Z   warnings.warn(
2025-12-04T12:12:57.8736986Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8737136Z   return x.grad, w.grad
2025-12-04T12:12:57.8737275Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8737842Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8737961Z Traceback (most recent call last):
2025-12-04T12:12:57.8738417Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8738626Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8738831Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8738837Z 
2025-12-04T12:12:57.8739058Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8739995Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8740002Z 
2025-12-04T12:12:57.8740260Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8740484Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8740592Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8740707Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8741049Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8741262Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8741371Z graph_break []
2025-12-04T12:12:57.8741578Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8744228Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8744346Z   return x.grad, w.grad
2025-12-04T12:12:57.8745059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8745204Z   warnings.warn(
2025-12-04T12:12:57.8747866Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8747986Z   return x.grad, w.grad
2025-12-04T12:12:57.8748239Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8748349Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8748477Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8748693Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8749037Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8749132Z graph_break []
2025-12-04T12:12:57.8749382Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8752047Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8752150Z   return x.grad, w.grad
2025-12-04T12:12:57.8752878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8752974Z   warnings.warn(
2025-12-04T12:12:57.8755626Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8755728Z   return x.grad, w.grad
2025-12-04T12:12:57.8755938Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8756058Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8756170Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8756400Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8756732Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8756828Z graph_break []
2025-12-04T12:12:57.8757051Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8757764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8757861Z   warnings.warn(
2025-12-04T12:12:57.8760543Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8760682Z   return x.grad, w.grad
2025-12-04T12:12:57.8761498Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml -
2025-12-04T12:12:57.8761667Z =========================== short test summary info ============================
2025-12-04T12:12:57.8762838Z FAILED [0.1658s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8762847Z 
2025-12-04T12:12:57.8763059Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8763993Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8764047Z 
2025-12-04T12:12:57.8764309Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8764485Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8764693Z ================== 1 failed, 135 deselected, 2 rerun in 4.98s ==================
2025-12-04T12:12:57.8764790Z Got exit code 1
2025-12-04T12:12:57.8764896Z Retrying single test...
2025-12-04T12:12:57.8765537Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml
2025-12-04T12:12:57.8765697Z ============================= test session starts ==============================
2025-12-04T12:12:57.8766053Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8766159Z cachedir: .pytest_cache
2025-12-04T12:12:57.8766669Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8766802Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8766906Z configfile: pytest.ini
2025-12-04T12:12:57.8767482Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8767719Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8768735Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8768863Z Running 1 items in this shard
2025-12-04T12:12:57.8768868Z 
2025-12-04T12:12:57.8769765Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5585s] [100%]
2025-12-04T12:12:57.8770672Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1662s] [100%]
2025-12-04T12:12:57.8771488Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1620s] [100%]
2025-12-04T12:12:57.8771494Z 
2025-12-04T12:12:57.8771630Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8772225Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8772344Z Traceback (most recent call last):
2025-12-04T12:12:57.8772843Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8773039Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8773243Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8773248Z 
2025-12-04T12:12:57.8773467Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8774430Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8774435Z 
2025-12-04T12:12:57.8774708Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8774922Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8775031Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8775155Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8775488Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8775745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8775842Z graph_break []
2025-12-04T12:12:57.8776051Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8778729Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8778838Z   return x.grad, w.grad
2025-12-04T12:12:57.8779573Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8779677Z   warnings.warn(
2025-12-04T12:12:57.8782330Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8782436Z   return x.grad, w.grad
2025-12-04T12:12:57.8782992Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8783130Z Traceback (most recent call last):
2025-12-04T12:12:57.8783589Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8783800Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8784007Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8784012Z 
2025-12-04T12:12:57.8784224Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8785169Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8785227Z 
2025-12-04T12:12:57.8785487Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8785740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8785853Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8785967Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8791619Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8791882Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8791984Z graph_break []
2025-12-04T12:12:57.8792303Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8794977Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8795152Z   return x.grad, w.grad
2025-12-04T12:12:57.8795878Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8795977Z   warnings.warn(
2025-12-04T12:12:57.8798642Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8798752Z   return x.grad, w.grad
2025-12-04T12:12:57.8798983Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8799092Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8799222Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8799444Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8799777Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8799891Z graph_break []
2025-12-04T12:12:57.8800106Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8803089Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8803201Z   return x.grad, w.grad
2025-12-04T12:12:57.8803926Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8804042Z   warnings.warn(
2025-12-04T12:12:57.8806750Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8806940Z   return x.grad, w.grad
2025-12-04T12:12:57.8807085Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8807696Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8807818Z Traceback (most recent call last):
2025-12-04T12:12:57.8808270Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8808496Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8808708Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8808714Z 
2025-12-04T12:12:57.8808927Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8809915Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8809921Z 
2025-12-04T12:12:57.8810180Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8810410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8810519Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8810630Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8810977Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8811193Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8811288Z graph_break []
2025-12-04T12:12:57.8811513Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8814158Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8814277Z   return x.grad, w.grad
2025-12-04T12:12:57.8814996Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8815108Z   warnings.warn(
2025-12-04T12:12:57.8817737Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8817853Z   return x.grad, w.grad
2025-12-04T12:12:57.8818066Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8818208Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8818335Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8818554Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8818887Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8819028Z graph_break []
2025-12-04T12:12:57.8819239Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8821922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8822029Z   return x.grad, w.grad
2025-12-04T12:12:57.8822756Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8822888Z   warnings.warn(
2025-12-04T12:12:57.8825515Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8825634Z   return x.grad, w.grad
2025-12-04T12:12:57.8825843Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8825962Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8826073Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8826291Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8826635Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8826730Z graph_break []
2025-12-04T12:12:57.8826939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8827663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8827761Z   warnings.warn(
2025-12-04T12:12:57.8830400Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8830508Z   return x.grad, w.grad
2025-12-04T12:12:57.8831324Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml -
2025-12-04T12:12:57.8831495Z =========================== short test summary info ============================
2025-12-04T12:12:57.8832564Z FAILED [0.1620s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8832616Z 
2025-12-04T12:12:57.8832827Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8833787Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8833795Z 
2025-12-04T12:12:57.8834069Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8834244Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8834514Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.8834612Z Got exit code 1
2025-12-04T12:12:57.8834715Z Retrying single test...
2025-12-04T12:12:57.8835352Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml
2025-12-04T12:12:57.8835512Z ============================= test session starts ==============================
2025-12-04T12:12:57.8835853Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8836006Z cachedir: .pytest_cache
2025-12-04T12:12:57.8836514Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8836646Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8836753Z configfile: pytest.ini
2025-12-04T12:12:57.8837332Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8837570Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8838584Z stepcurrent: skipping 135 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8838710Z Running 1 items in this shard
2025-12-04T12:12:57.8838715Z 
2025-12-04T12:12:57.8839608Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.6065s] [100%]
2025-12-04T12:12:57.8840502Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1695s] [100%]
2025-12-04T12:12:57.8841327Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1666s] [100%]
2025-12-04T12:12:57.8841335Z 
2025-12-04T12:12:57.8841471Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8842036Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8842220Z Traceback (most recent call last):
2025-12-04T12:12:57.8842686Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8842898Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8843106Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8843112Z 
2025-12-04T12:12:57.8843334Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8844270Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8844313Z 
2025-12-04T12:12:57.8844588Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8844804Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8844915Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8845082Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8845416Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8845631Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8845740Z graph_break []
2025-12-04T12:12:57.8845950Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8848641Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8848775Z   return x.grad, w.grad
2025-12-04T12:12:57.8849493Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8849603Z   warnings.warn(
2025-12-04T12:12:57.8852220Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8852338Z   return x.grad, w.grad
2025-12-04T12:12:57.8852893Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8853023Z Traceback (most recent call last):
2025-12-04T12:12:57.8853480Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8853673Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8853894Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8853901Z 
2025-12-04T12:12:57.8854107Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8855056Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8855062Z 
2025-12-04T12:12:57.8855321Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8855536Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8855659Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8855774Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8856120Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8856334Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8856431Z graph_break []
2025-12-04T12:12:57.8856656Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8859352Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8859502Z   return x.grad, w.grad
2025-12-04T12:12:57.8860218Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8860345Z   warnings.warn(
2025-12-04T12:12:57.8862999Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8863136Z   return x.grad, w.grad
2025-12-04T12:12:57.8863364Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8863470Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8863584Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8863821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8864151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8864261Z graph_break []
2025-12-04T12:12:57.8864470Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8867104Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8867220Z   return x.grad, w.grad
2025-12-04T12:12:57.8867930Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8868042Z   warnings.warn(
2025-12-04T12:12:57.8870680Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8870795Z   return x.grad, w.grad
2025-12-04T12:12:57.8870935Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8871490Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.8871622Z Traceback (most recent call last):
2025-12-04T12:12:57.8872115Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8872320Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8872524Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8872531Z 
2025-12-04T12:12:57.8872772Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8873715Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8873721Z 
2025-12-04T12:12:57.8873981Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8874233Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8874342Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8874456Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8874800Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8875014Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8875121Z graph_break []
2025-12-04T12:12:57.8875339Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8878017Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8878139Z   return x.grad, w.grad
2025-12-04T12:12:57.8878848Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8878962Z   warnings.warn(
2025-12-04T12:12:57.8881592Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8881708Z   return x.grad, w.grad
2025-12-04T12:12:57.8881918Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8882026Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8882229Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8882450Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8882787Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8882898Z graph_break []
2025-12-04T12:12:57.8883108Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8885762Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8885910Z   return x.grad, w.grad
2025-12-04T12:12:57.8886667Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8886769Z   warnings.warn(
2025-12-04T12:12:57.8889439Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8889560Z   return x.grad, w.grad
2025-12-04T12:12:57.8889771Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8889895Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8890036Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8890254Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8890595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8890690Z graph_break []
2025-12-04T12:12:57.8890911Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8891628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8891725Z   warnings.warn(
2025-12-04T12:12:57.8894378Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.8894485Z   return x.grad, w.grad
2025-12-04T12:12:57.8895299Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml -
2025-12-04T12:12:57.8895467Z =========================== short test summary info ============================
2025-12-04T12:12:57.8896538Z FAILED [0.1666s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8896543Z 
2025-12-04T12:12:57.8896757Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8897690Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8897696Z 
2025-12-04T12:12:57.8897969Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8898145Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8898358Z ================== 1 failed, 174 deselected, 2 rerun in 5.00s ==================
2025-12-04T12:12:57.8898455Z Got exit code 1
2025-12-04T12:12:57.8899335Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.8899749Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.8900421Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml
2025-12-04T12:12:57.8900594Z ============================= test session starts ==============================
2025-12-04T12:12:57.8901196Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8901304Z cachedir: .pytest_cache
2025-12-04T12:12:57.8901896Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8902018Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8902123Z configfile: pytest.ini
2025-12-04T12:12:57.8902713Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8902934Z collecting ... collected 380 items / 136 deselected / 244 selected
2025-12-04T12:12:57.8903151Z stepcurrent: skipping 136 already run items.
2025-12-04T12:12:57.8903263Z Running 39 items in this shard
2025-12-04T12:12:57.8903269Z 
2025-12-04T12:12:57.8904281Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.8905301Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0032s] (Skip non-critical tests to save resources.) [  5%]
2025-12-04T12:12:57.8906302Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  7%]
2025-12-04T12:12:57.8907209Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5874s] [ 10%]
2025-12-04T12:12:57.8908098Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1663s] [ 10%]
2025-12-04T12:12:57.8908927Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1649s] [ 10%]
2025-12-04T12:12:57.8908934Z 
2025-12-04T12:12:57.8909071Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8909647Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8909769Z Traceback (most recent call last):
2025-12-04T12:12:57.8910237Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8910443Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8910650Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8910655Z 
2025-12-04T12:12:57.8910863Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8911818Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8911868Z 
2025-12-04T12:12:57.8912128Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8912356Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8912469Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8912583Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8912969Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8913186Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8913298Z graph_break []
2025-12-04T12:12:57.8913508Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8914259Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8914372Z   warnings.warn(
2025-12-04T12:12:57.8914928Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8915045Z Traceback (most recent call last):
2025-12-04T12:12:57.8915513Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8915741Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8915960Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8915965Z 
2025-12-04T12:12:57.8916173Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8917108Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8917127Z 
2025-12-04T12:12:57.8917385Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8917598Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8917721Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8917833Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8918167Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8918399Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8918494Z graph_break []
2025-12-04T12:12:57.8918701Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8919433Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8919537Z   warnings.warn(
2025-12-04T12:12:57.8919757Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8919865Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8919978Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8920203Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8920533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8920631Z graph_break []
2025-12-04T12:12:57.8920858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8921564Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8921675Z   warnings.warn(
2025-12-04T12:12:57.8921813Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8922455Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8922590Z Traceback (most recent call last):
2025-12-04T12:12:57.8923084Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8923292Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8923501Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8923509Z 
2025-12-04T12:12:57.8923749Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8924696Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8924701Z 
2025-12-04T12:12:57.8924961Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8925214Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8925322Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8925435Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8925776Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8925987Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8926081Z graph_break []
2025-12-04T12:12:57.8926308Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8927051Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8927162Z   warnings.warn(
2025-12-04T12:12:57.8927372Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8927480Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8927608Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8927819Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8928149Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8928254Z graph_break []
2025-12-04T12:12:57.8928464Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8929185Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8929284Z   warnings.warn(
2025-12-04T12:12:57.8929491Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8929613Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8929726Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8929938Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8930277Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8930370Z graph_break []
2025-12-04T12:12:57.8930578Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8931295Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8931390Z   warnings.warn(
2025-12-04T12:12:57.8932199Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml -
2025-12-04T12:12:57.8932363Z =========================== short test summary info ============================
2025-12-04T12:12:57.8936160Z FAILED [0.1649s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8936180Z 
2025-12-04T12:12:57.8936420Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8937490Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8937495Z 
2025-12-04T12:12:57.8937772Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8937948Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8938167Z ============ 1 failed, 3 skipped, 136 deselected, 2 rerun in 4.99s =============
2025-12-04T12:12:57.8938277Z Got exit code 1
2025-12-04T12:12:57.8938382Z Retrying single test...
2025-12-04T12:12:57.8939047Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml
2025-12-04T12:12:57.8939264Z ============================= test session starts ==============================
2025-12-04T12:12:57.8939613Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8939740Z cachedir: .pytest_cache
2025-12-04T12:12:57.8940244Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8940364Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8940516Z configfile: pytest.ini
2025-12-04T12:12:57.8941089Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8941309Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8942354Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8942466Z Running 1 items in this shard
2025-12-04T12:12:57.8942473Z 
2025-12-04T12:12:57.8943382Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5868s] [100%]
2025-12-04T12:12:57.8944276Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1619s] [100%]
2025-12-04T12:12:57.8945104Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1601s] [100%]
2025-12-04T12:12:57.8945111Z 
2025-12-04T12:12:57.8945248Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8945806Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8945941Z Traceback (most recent call last):
2025-12-04T12:12:57.8946400Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8946602Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8946812Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8946817Z 
2025-12-04T12:12:57.8947027Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8947971Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8947976Z 
2025-12-04T12:12:57.8948308Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8948538Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8948677Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8948793Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8949134Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8949348Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8949445Z graph_break []
2025-12-04T12:12:57.8949669Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8950389Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8950504Z   warnings.warn(
2025-12-04T12:12:57.8951118Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8951239Z Traceback (most recent call last):
2025-12-04T12:12:57.8951712Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8951908Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8952116Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8952135Z 
2025-12-04T12:12:57.8952342Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8953801Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8953806Z 
2025-12-04T12:12:57.8954080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8954296Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8954424Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8954538Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8954875Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8955105Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8955202Z graph_break []
2025-12-04T12:12:57.8955412Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8956144Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8956240Z   warnings.warn(
2025-12-04T12:12:57.8956474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8956580Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8956691Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8956923Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8957251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8957348Z graph_break []
2025-12-04T12:12:57.8957569Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8958282Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8958398Z   warnings.warn(
2025-12-04T12:12:57.8958537Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8959097Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8959230Z Traceback (most recent call last):
2025-12-04T12:12:57.8959738Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8959945Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8960180Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8960186Z 
2025-12-04T12:12:57.8960393Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8961343Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8961350Z 
2025-12-04T12:12:57.8961609Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8961836Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8961944Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8962056Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8962545Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8962763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8962861Z graph_break []
2025-12-04T12:12:57.8963085Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8963797Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8963943Z   warnings.warn(
2025-12-04T12:12:57.8964152Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8964259Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8964385Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8964598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8964929Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8965041Z graph_break []
2025-12-04T12:12:57.8965254Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8965975Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8966073Z   warnings.warn(
2025-12-04T12:12:57.8966284Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8966406Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8966516Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8966734Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8967065Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8967161Z graph_break []
2025-12-04T12:12:57.8967371Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8968088Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8968188Z   warnings.warn(
2025-12-04T12:12:57.8968998Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml -
2025-12-04T12:12:57.8969163Z =========================== short test summary info ============================
2025-12-04T12:12:57.8970231Z FAILED [0.1601s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8970248Z 
2025-12-04T12:12:57.8970458Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8971423Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8971460Z 
2025-12-04T12:12:57.8971730Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8971902Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.8972108Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ==================
2025-12-04T12:12:57.8972204Z Got exit code 1
2025-12-04T12:12:57.8972307Z Retrying single test...
2025-12-04T12:12:57.8972942Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml
2025-12-04T12:12:57.8973097Z ============================= test session starts ==============================
2025-12-04T12:12:57.8973464Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.8973583Z cachedir: .pytest_cache
2025-12-04T12:12:57.8974089Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.8974224Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.8974323Z configfile: pytest.ini
2025-12-04T12:12:57.8974897Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.8975166Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.8976181Z stepcurrent: skipping 139 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8976305Z Running 1 items in this shard
2025-12-04T12:12:57.8976310Z 
2025-12-04T12:12:57.8977207Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5539s] [100%]
2025-12-04T12:12:57.8978102Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1620s] [100%]
2025-12-04T12:12:57.8978928Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1552s] [100%]
2025-12-04T12:12:57.8978936Z 
2025-12-04T12:12:57.8979071Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.8979634Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8979754Z Traceback (most recent call last):
2025-12-04T12:12:57.8980215Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8980419Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8980624Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8980629Z 
2025-12-04T12:12:57.8980848Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8981780Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8981787Z 
2025-12-04T12:12:57.8982056Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8982266Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8982373Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8982497Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8982866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8983112Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8983216Z graph_break []
2025-12-04T12:12:57.8983423Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8984153Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8984254Z   warnings.warn(
2025-12-04T12:12:57.8984810Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8984940Z Traceback (most recent call last):
2025-12-04T12:12:57.8985430Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8985631Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8985849Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8985856Z 
2025-12-04T12:12:57.8986065Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8987013Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8987048Z 
2025-12-04T12:12:57.8987307Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8987516Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8987634Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8987745Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8988089Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8988301Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8988399Z graph_break []
2025-12-04T12:12:57.8988620Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8989333Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8989435Z   warnings.warn(
2025-12-04T12:12:57.8989654Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8989762Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8989886Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8990094Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8990424Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8990533Z graph_break []
2025-12-04T12:12:57.8990741Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8991450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8991560Z   warnings.warn(
2025-12-04T12:12:57.8991698Z =================================== FAILURES ===================================
2025-12-04T12:12:57.8992267Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.8992385Z Traceback (most recent call last):
2025-12-04T12:12:57.8992840Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.8993044Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.8993256Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.8993261Z 
2025-12-04T12:12:57.8993506Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.8994497Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.8994503Z 
2025-12-04T12:12:57.8994758Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.8994981Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8995087Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8995199Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8995542Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8995754Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8995863Z graph_break []
2025-12-04T12:12:57.8996107Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8996821Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8996932Z   warnings.warn(
2025-12-04T12:12:57.8997145Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8997251Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8997404Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8997615Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.8997951Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.8998044Z graph_break []
2025-12-04T12:12:57.8998252Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.8998977Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.8999074Z   warnings.warn(
2025-12-04T12:12:57.8999281Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.8999402Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.8999511Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.8999735Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9000061Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9000159Z graph_break []
2025-12-04T12:12:57.9000375Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9001320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9001423Z   warnings.warn(
2025-12-04T12:12:57.9002303Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml -
2025-12-04T12:12:57.9002471Z =========================== short test summary info ============================
2025-12-04T12:12:57.9003550Z FAILED [0.1552s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9003559Z 
2025-12-04T12:12:57.9003770Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9004724Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9004730Z 
2025-12-04T12:12:57.9005087Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9005270Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9005515Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.9005611Z Got exit code 1
2025-12-04T12:12:57.9006480Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9006887Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9007508Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml
2025-12-04T12:12:57.9007682Z ============================= test session starts ==============================
2025-12-04T12:12:57.9008069Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9008176Z cachedir: .pytest_cache
2025-12-04T12:12:57.9008695Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9008817Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9008938Z configfile: pytest.ini
2025-12-04T12:12:57.9009513Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9009784Z collecting ... collected 380 items / 140 deselected / 240 selected
2025-12-04T12:12:57.9009940Z stepcurrent: skipping 140 already run items.
2025-12-04T12:12:57.9010051Z Running 35 items in this shard
2025-12-04T12:12:57.9010056Z 
2025-12-04T12:12:57.9011084Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  2%]
2025-12-04T12:12:57.9012084Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [  5%]
2025-12-04T12:12:57.9012978Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5577s] [  8%]
2025-12-04T12:12:57.9013878Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1590s] [  8%]
2025-12-04T12:12:57.9014704Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1571s] [  8%]
2025-12-04T12:12:57.9014710Z 
2025-12-04T12:12:57.9014859Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9015418Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9015547Z Traceback (most recent call last):
2025-12-04T12:12:57.9016007Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9016201Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9016420Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9016425Z 
2025-12-04T12:12:57.9016633Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9017625Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9017630Z 
2025-12-04T12:12:57.9017921Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9018134Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9018254Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9018365Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9018697Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9018926Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9019022Z graph_break []
2025-12-04T12:12:57.9019246Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9019998Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9020103Z   warnings.warn(
2025-12-04T12:12:57.9020674Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9020796Z Traceback (most recent call last):
2025-12-04T12:12:57.9021273Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9021467Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9021706Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9021710Z 
2025-12-04T12:12:57.9021936Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9022869Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9022874Z 
2025-12-04T12:12:57.9023156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9023369Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9023485Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9023611Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9023943Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9024158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9024271Z graph_break []
2025-12-04T12:12:57.9024484Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9025214Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9025313Z   warnings.warn(
2025-12-04T12:12:57.9025528Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9025654Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9025764Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9025979Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9026322Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9026419Z graph_break []
2025-12-04T12:12:57.9026646Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9027354Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9027451Z   warnings.warn(
2025-12-04T12:12:57.9027605Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9028157Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9028307Z Traceback (most recent call last):
2025-12-04T12:12:57.9028780Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9029001Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9029218Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9029223Z 
2025-12-04T12:12:57.9029431Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9030369Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9030390Z 
2025-12-04T12:12:57.9030645Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9030885Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9031010Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9031120Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9031449Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9031674Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9031768Z graph_break []
2025-12-04T12:12:57.9031977Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9032728Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9032824Z   warnings.warn(
2025-12-04T12:12:57.9033044Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9033150Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9033264Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9033489Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9033818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9033914Z graph_break []
2025-12-04T12:12:57.9034137Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9034847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9034962Z   warnings.warn(
2025-12-04T12:12:57.9035171Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9035277Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9035401Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9035618Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9035949Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9036056Z graph_break []
2025-12-04T12:12:57.9036263Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9036987Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9037084Z   warnings.warn(
2025-12-04T12:12:57.9037882Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml -
2025-12-04T12:12:57.9038061Z =========================== short test summary info ============================
2025-12-04T12:12:57.9039116Z FAILED [0.1571s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9039124Z 
2025-12-04T12:12:57.9039394Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9040328Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9040363Z 
2025-12-04T12:12:57.9040634Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9040809Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9041023Z ============ 1 failed, 2 skipped, 140 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:57.9041129Z Got exit code 1
2025-12-04T12:12:57.9041230Z Retrying single test...
2025-12-04T12:12:57.9041851Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml
2025-12-04T12:12:57.9042055Z ============================= test session starts ==============================
2025-12-04T12:12:57.9042474Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9042597Z cachedir: .pytest_cache
2025-12-04T12:12:57.9043106Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9043225Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9043340Z configfile: pytest.ini
2025-12-04T12:12:57.9043950Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9044172Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9045209Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9045322Z Running 1 items in this shard
2025-12-04T12:12:57.9045327Z 
2025-12-04T12:12:57.9046239Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5603s] [100%]
2025-12-04T12:12:57.9047133Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1625s] [100%]
2025-12-04T12:12:57.9047966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1601s] [100%]
2025-12-04T12:12:57.9047971Z 
2025-12-04T12:12:57.9048108Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9048668Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9048800Z Traceback (most recent call last):
2025-12-04T12:12:57.9049260Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9049464Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9049667Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9049674Z 
2025-12-04T12:12:57.9049882Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9050822Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9050827Z 
2025-12-04T12:12:57.9051083Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9051344Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9051450Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9051592Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9051934Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9052147Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9052241Z graph_break []
2025-12-04T12:12:57.9052464Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9053180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9053292Z   warnings.warn(
2025-12-04T12:12:57.9053846Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9053993Z Traceback (most recent call last):
2025-12-04T12:12:57.9054464Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9054656Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9054859Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9054878Z 
2025-12-04T12:12:57.9055085Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9056047Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9056053Z 
2025-12-04T12:12:57.9056326Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9056537Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9056662Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9056773Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9057102Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9057330Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9057426Z graph_break []
2025-12-04T12:12:57.9057636Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9058360Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9058459Z   warnings.warn(
2025-12-04T12:12:57.9058679Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9058784Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9058893Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9059120Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9059447Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9059545Z graph_break []
2025-12-04T12:12:57.9059760Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9060464Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9060574Z   warnings.warn(
2025-12-04T12:12:57.9060711Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9061267Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9061395Z Traceback (most recent call last):
2025-12-04T12:12:57.9061855Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9062077Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9062295Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9062329Z 
2025-12-04T12:12:57.9062539Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9063484Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9063492Z 
2025-12-04T12:12:57.9063745Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9063954Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9064073Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9064181Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9064550Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9064763Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9064860Z graph_break []
2025-12-04T12:12:57.9065080Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9065792Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9065893Z   warnings.warn(
2025-12-04T12:12:57.9066148Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9066255Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9066382Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9066594Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9066919Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9067032Z graph_break []
2025-12-04T12:12:57.9067240Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9067952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9068063Z   warnings.warn(
2025-12-04T12:12:57.9068269Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9068386Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9068497Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9068711Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9069051Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9069147Z graph_break []
2025-12-04T12:12:57.9069353Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9070078Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9070178Z   warnings.warn(
2025-12-04T12:12:57.9070989Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml -
2025-12-04T12:12:57.9071157Z =========================== short test summary info ============================
2025-12-04T12:12:57.9072221Z FAILED [0.1601s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9072227Z 
2025-12-04T12:12:57.9072449Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9073419Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9073424Z 
2025-12-04T12:12:57.9073723Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9073897Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9074091Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.9074210Z Got exit code 1
2025-12-04T12:12:57.9074316Z Retrying single test...
2025-12-04T12:12:57.9074959Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml
2025-12-04T12:12:57.9075115Z ============================= test session starts ==============================
2025-12-04T12:12:57.9075456Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9075608Z cachedir: .pytest_cache
2025-12-04T12:12:57.9076123Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9076243Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9076365Z configfile: pytest.ini
2025-12-04T12:12:57.9076936Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9077173Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9078236Z stepcurrent: skipping 142 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9078348Z Running 1 items in this shard
2025-12-04T12:12:57.9078354Z 
2025-12-04T12:12:57.9079273Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5551s] [100%]
2025-12-04T12:12:57.9080179Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1612s] [100%]
2025-12-04T12:12:57.9081013Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1608s] [100%]
2025-12-04T12:12:57.9081020Z 
2025-12-04T12:12:57.9081160Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9081728Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9081846Z Traceback (most recent call last):
2025-12-04T12:12:57.9082382Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9082596Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9082807Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9082813Z 
2025-12-04T12:12:57.9083035Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9083969Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9083976Z 
2025-12-04T12:12:57.9084236Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9084467Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9084578Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9084708Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9085085Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9085303Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9085446Z graph_break []
2025-12-04T12:12:57.9085656Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9086378Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9086494Z   warnings.warn(
2025-12-04T12:12:57.9087052Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9087184Z Traceback (most recent call last):
2025-12-04T12:12:57.9087644Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9087866Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9088090Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9088097Z 
2025-12-04T12:12:57.9088306Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9089247Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9089280Z 
2025-12-04T12:12:57.9089538Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9089748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9089867Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9089978Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9090307Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9090536Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9090629Z graph_break []
2025-12-04T12:12:57.9090850Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9091565Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9091663Z   warnings.warn(
2025-12-04T12:12:57.9091886Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9091996Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9092107Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9092331Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9092656Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9092768Z graph_break []
2025-12-04T12:12:57.9092980Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9093690Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9093806Z   warnings.warn(
2025-12-04T12:12:57.9093944Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9094498Z _ NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9094630Z Traceback (most recent call last):
2025-12-04T12:12:57.9095090Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9095296Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9095503Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9095507Z 
2025-12-04T12:12:57.9095749Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9096707Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9096742Z 
2025-12-04T12:12:57.9097002Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9097225Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9097336Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9097447Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9097789Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9098000Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9098096Z graph_break []
2025-12-04T12:12:57.9098347Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9099062Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9099175Z   warnings.warn(
2025-12-04T12:12:57.9099383Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9099489Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9099612Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9099854Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9100182Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9100287Z graph_break []
2025-12-04T12:12:57.9100493Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9101580Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9101682Z   warnings.warn(
2025-12-04T12:12:57.9101891Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9102014Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9102176Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9102472Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9102815Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9102914Z graph_break []
2025-12-04T12:12:57.9103134Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9103847Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9103951Z   warnings.warn(
2025-12-04T12:12:57.9104766Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml -
2025-12-04T12:12:57.9104934Z =========================== short test summary info ============================
2025-12-04T12:12:57.9106019Z FAILED [0.1608s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9106027Z 
2025-12-04T12:12:57.9106238Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9107167Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9107185Z 
2025-12-04T12:12:57.9107447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9107721Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9107935Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:57.9108077Z Got exit code 1
2025-12-04T12:12:57.9108927Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9109345Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9109969Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml
2025-12-04T12:12:57.9110143Z ============================= test session starts ==============================
2025-12-04T12:12:57.9110523Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9110634Z cachedir: .pytest_cache
2025-12-04T12:12:57.9111152Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9111274Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9111396Z configfile: pytest.ini
2025-12-04T12:12:57.9111971Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9112234Z collecting ... collected 380 items / 143 deselected / 237 selected
2025-12-04T12:12:57.9112392Z stepcurrent: skipping 143 already run items.
2025-12-04T12:12:57.9112506Z Running 32 items in this shard
2025-12-04T12:12:57.9112511Z 
2025-12-04T12:12:57.9113529Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0042s] (Skip non-critical tests to save resources.) [  3%]
2025-12-04T12:12:57.9114553Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [  6%]
2025-12-04T12:12:57.9115549Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0038s] (Skip non-critical tests to save resources.) [  9%]
2025-12-04T12:12:57.9116461Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5282s] [ 12%]
2025-12-04T12:12:57.9117355Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1587s] [ 12%]
2025-12-04T12:12:57.9118188Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [ 12%]
2025-12-04T12:12:57.9118196Z 
2025-12-04T12:12:57.9118336Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9118909Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9119029Z Traceback (most recent call last):
2025-12-04T12:12:57.9119491Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9119697Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9119904Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9119910Z 
2025-12-04T12:12:57.9120139Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9121189Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9121223Z 
2025-12-04T12:12:57.9121484Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9121713Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9121825Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9121935Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9122349Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9122566Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9122674Z graph_break []
2025-12-04T12:12:57.9122939Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9123664Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9123775Z   warnings.warn(
2025-12-04T12:12:57.9124331Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9124465Z Traceback (most recent call last):
2025-12-04T12:12:57.9124961Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9125154Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9125375Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9125380Z 
2025-12-04T12:12:57.9125589Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9126531Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9126552Z 
2025-12-04T12:12:57.9126811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9127022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9127144Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9127255Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9127590Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9127814Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9127909Z graph_break []
2025-12-04T12:12:57.9128130Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9128852Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9128953Z   warnings.warn(
2025-12-04T12:12:57.9129181Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9129288Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9129400Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9129625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9129953Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9130064Z graph_break []
2025-12-04T12:12:57.9130272Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9130983Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9131091Z   warnings.warn(
2025-12-04T12:12:57.9131284Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9131843Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9132002Z Traceback (most recent call last):
2025-12-04T12:12:57.9132462Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9132665Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9132874Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9132878Z 
2025-12-04T12:12:57.9133089Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9134037Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9134073Z 
2025-12-04T12:12:57.9134337Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9134561Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9134674Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9134788Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9135133Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9135349Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9135477Z graph_break []
2025-12-04T12:12:57.9135699Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9136415Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9136528Z   warnings.warn(
2025-12-04T12:12:57.9136739Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9136852Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9136981Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9137202Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9137533Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9137644Z graph_break []
2025-12-04T12:12:57.9137855Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9138587Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9138686Z   warnings.warn(
2025-12-04T12:12:57.9138897Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9139021Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9139134Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9139355Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9139698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9139798Z graph_break []
2025-12-04T12:12:57.9140023Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9140734Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9140835Z   warnings.warn(
2025-12-04T12:12:57.9141649Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml -
2025-12-04T12:12:57.9141817Z =========================== short test summary info ============================
2025-12-04T12:12:57.9142936Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9142969Z 
2025-12-04T12:12:57.9143183Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9144116Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9144138Z 
2025-12-04T12:12:57.9144400Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9144578Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9144810Z ============ 1 failed, 3 skipped, 143 deselected, 2 rerun in 4.91s =============
2025-12-04T12:12:57.9144911Z Got exit code 1
2025-12-04T12:12:57.9145017Z Retrying single test...
2025-12-04T12:12:57.9145693Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml
2025-12-04T12:12:57.9145854Z ============================= test session starts ==============================
2025-12-04T12:12:57.9146209Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9146317Z cachedir: .pytest_cache
2025-12-04T12:12:57.9146827Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9146995Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9147097Z configfile: pytest.ini
2025-12-04T12:12:57.9147671Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9147912Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9148934Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9149065Z Running 1 items in this shard
2025-12-04T12:12:57.9149070Z 
2025-12-04T12:12:57.9149968Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5536s] [100%]
2025-12-04T12:12:57.9150869Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1586s] [100%]
2025-12-04T12:12:57.9151702Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1574s] [100%]
2025-12-04T12:12:57.9151708Z 
2025-12-04T12:12:57.9151845Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9152414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9152530Z Traceback (most recent call last):
2025-12-04T12:12:57.9153002Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9153196Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9153401Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9153406Z 
2025-12-04T12:12:57.9153624Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9154596Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9154602Z 
2025-12-04T12:12:57.9154877Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9155118Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9155226Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9155352Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9155682Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9155896Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9156005Z graph_break []
2025-12-04T12:12:57.9156218Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9156952Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9157081Z   warnings.warn(
2025-12-04T12:12:57.9157640Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9157776Z Traceback (most recent call last):
2025-12-04T12:12:57.9158235Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9158430Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9158650Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9158685Z 
2025-12-04T12:12:57.9158895Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9159837Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9159844Z 
2025-12-04T12:12:57.9160105Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9160332Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9160442Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9160556Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9160898Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9161110Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9161207Z graph_break []
2025-12-04T12:12:57.9161428Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9162215Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9162332Z   warnings.warn(
2025-12-04T12:12:57.9162544Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9162654Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9162779Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9162993Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9163323Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9163432Z graph_break []
2025-12-04T12:12:57.9163643Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9164355Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9164466Z   warnings.warn(
2025-12-04T12:12:57.9164606Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9165177Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9165297Z Traceback (most recent call last):
2025-12-04T12:12:57.9165793Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9166041Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9166250Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9166254Z 
2025-12-04T12:12:57.9166475Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9167406Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9167411Z 
2025-12-04T12:12:57.9167674Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9167898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9168039Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9168175Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9168505Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9168720Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9168829Z graph_break []
2025-12-04T12:12:57.9169036Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9169751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9169893Z   warnings.warn(
2025-12-04T12:12:57.9170100Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9170220Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9170329Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9170544Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9170886Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9170982Z graph_break []
2025-12-04T12:12:57.9171190Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9171907Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9172005Z   warnings.warn(
2025-12-04T12:12:57.9172223Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9172331Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9172441Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9172666Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9172994Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9173089Z graph_break []
2025-12-04T12:12:57.9173312Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9174023Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9174132Z   warnings.warn(
2025-12-04T12:12:57.9174931Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml -
2025-12-04T12:12:57.9175097Z =========================== short test summary info ============================
2025-12-04T12:12:57.9176171Z FAILED [0.1574s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9176177Z 
2025-12-04T12:12:57.9176427Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9177376Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9177412Z 
2025-12-04T12:12:57.9177672Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9177847Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9178056Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.9178156Z Got exit code 1
2025-12-04T12:12:57.9178275Z Retrying single test...
2025-12-04T12:12:57.9178901Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml
2025-12-04T12:12:57.9179092Z ============================= test session starts ==============================
2025-12-04T12:12:57.9179445Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9179551Z cachedir: .pytest_cache
2025-12-04T12:12:57.9180058Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9180193Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9180297Z configfile: pytest.ini
2025-12-04T12:12:57.9180887Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9181140Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9182160Z stepcurrent: skipping 146 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9182287Z Running 1 items in this shard
2025-12-04T12:12:57.9182292Z 
2025-12-04T12:12:57.9183189Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5710s] [100%]
2025-12-04T12:12:57.9184092Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1658s] [100%]
2025-12-04T12:12:57.9184908Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1594s] [100%]
2025-12-04T12:12:57.9184914Z 
2025-12-04T12:12:57.9185064Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9185625Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9185746Z Traceback (most recent call last):
2025-12-04T12:12:57.9186222Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9186418Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9186620Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9186639Z 
2025-12-04T12:12:57.9186851Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9187782Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9187788Z 
2025-12-04T12:12:57.9188057Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9188314Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9188439Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9188551Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9188916Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9189142Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9189238Z graph_break []
2025-12-04T12:12:57.9189447Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9190180Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9190278Z   warnings.warn(
2025-12-04T12:12:57.9190850Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9191000Z Traceback (most recent call last):
2025-12-04T12:12:57.9191462Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9191669Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9191876Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9191881Z 
2025-12-04T12:12:57.9192089Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9193037Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9193080Z 
2025-12-04T12:12:57.9193338Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9193560Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9193669Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9193781Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9194128Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9194343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9194450Z graph_break []
2025-12-04T12:12:57.9194658Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9195374Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9195495Z   warnings.warn(
2025-12-04T12:12:57.9195703Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9195810Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9195937Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9196152Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9196498Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9196595Z graph_break []
2025-12-04T12:12:57.9196809Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9197531Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9197629Z   warnings.warn(
2025-12-04T12:12:57.9197772Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9198344Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9198464Z Traceback (most recent call last):
2025-12-04T12:12:57.9198940Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9199142Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9199384Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9199416Z 
2025-12-04T12:12:57.9199644Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9200581Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9200589Z 
2025-12-04T12:12:57.9201046Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9201258Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9201371Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9201499Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9201834Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9202177Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9202295Z graph_break []
2025-12-04T12:12:57.9202506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9203243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9203343Z   warnings.warn(
2025-12-04T12:12:57.9203552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9203724Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9203838Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9204051Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9204398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9204494Z graph_break []
2025-12-04T12:12:57.9204723Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9205432Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9205531Z   warnings.warn(
2025-12-04T12:12:57.9205748Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9205856Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9205967Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9206197Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9206522Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9206633Z graph_break []
2025-12-04T12:12:57.9206838Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9207547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9207655Z   warnings.warn(
2025-12-04T12:12:57.9208460Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml -
2025-12-04T12:12:57.9208642Z =========================== short test summary info ============================
2025-12-04T12:12:57.9209709Z FAILED [0.1594s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9209717Z 
2025-12-04T12:12:57.9209932Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9210942Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9210948Z 
2025-12-04T12:12:57.9211211Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9211437Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9211630Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.9211726Z Got exit code 1
2025-12-04T12:12:57.9212590Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9212989Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9213627Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml
2025-12-04T12:12:57.9213819Z ============================= test session starts ==============================
2025-12-04T12:12:57.9214164Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9214284Z cachedir: .pytest_cache
2025-12-04T12:12:57.9214794Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9214932Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9215070Z configfile: pytest.ini
2025-12-04T12:12:57.9215643Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9215880Z collecting ... collected 380 items / 147 deselected / 233 selected
2025-12-04T12:12:57.9216024Z stepcurrent: skipping 147 already run items.
2025-12-04T12:12:57.9216141Z Running 28 items in this shard
2025-12-04T12:12:57.9216145Z 
2025-12-04T12:12:57.9217174Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  3%]
2025-12-04T12:12:57.9218077Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6359s] [  7%]
2025-12-04T12:12:57.9218991Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1665s] [  7%]
2025-12-04T12:12:57.9219807Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [  7%]
2025-12-04T12:12:57.9219812Z 
2025-12-04T12:12:57.9219967Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9220525Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9220647Z Traceback (most recent call last):
2025-12-04T12:12:57.9221120Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9221314Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9221538Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9221543Z 
2025-12-04T12:12:57.9221752Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9222683Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9222688Z 
2025-12-04T12:12:57.9222992Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9223208Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9223359Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9223472Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9223803Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9224031Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9224128Z graph_break []
2025-12-04T12:12:57.9224337Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9225071Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9225168Z   warnings.warn(
2025-12-04T12:12:57.9225770Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9225890Z Traceback (most recent call last):
2025-12-04T12:12:57.9226349Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9226554Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9226762Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9226766Z 
2025-12-04T12:12:57.9227016Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9227946Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9227952Z 
2025-12-04T12:12:57.9228210Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9228439Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9228548Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9228661Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9229004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9229217Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9229326Z graph_break []
2025-12-04T12:12:57.9229535Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9230257Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9230368Z   warnings.warn(
2025-12-04T12:12:57.9230576Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9230683Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9230809Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9231027Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9231368Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9231466Z graph_break []
2025-12-04T12:12:57.9231673Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9232400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9232500Z   warnings.warn(
2025-12-04T12:12:57.9232640Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9233210Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9233329Z Traceback (most recent call last):
2025-12-04T12:12:57.9233838Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9234040Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9234274Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9234279Z 
2025-12-04T12:12:57.9234498Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9235435Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9235443Z 
2025-12-04T12:12:57.9235713Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9235924Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9236033Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9236194Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9236528Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9236758Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9236860Z graph_break []
2025-12-04T12:12:57.9237068Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9237798Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9237928Z   warnings.warn(
2025-12-04T12:12:57.9238136Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9238254Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9238364Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9238589Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9238918Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9239015Z graph_break []
2025-12-04T12:12:57.9239235Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9239947Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9240045Z   warnings.warn(
2025-12-04T12:12:57.9240264Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9240374Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9240496Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9240708Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9241035Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9241141Z graph_break []
2025-12-04T12:12:57.9241350Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9242059Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9242242Z   warnings.warn(
2025-12-04T12:12:57.9243045Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml -
2025-12-04T12:12:57.9243224Z =========================== short test summary info ============================
2025-12-04T12:12:57.9244298Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9244303Z 
2025-12-04T12:12:57.9244516Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9245508Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9245542Z 
2025-12-04T12:12:57.9245805Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9245992Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9246209Z ============ 1 failed, 1 skipped, 147 deselected, 2 rerun in 5.02s =============
2025-12-04T12:12:57.9246309Z Got exit code 1
2025-12-04T12:12:57.9246430Z Retrying single test...
2025-12-04T12:12:57.9247056Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml
2025-12-04T12:12:57.9247231Z ============================= test session starts ==============================
2025-12-04T12:12:57.9247606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9247715Z cachedir: .pytest_cache
2025-12-04T12:12:57.9248239Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9248364Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9248470Z configfile: pytest.ini
2025-12-04T12:12:57.9249056Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9249326Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9250356Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9250469Z Running 1 items in this shard
2025-12-04T12:12:57.9250474Z 
2025-12-04T12:12:57.9251375Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.6463s] [100%]
2025-12-04T12:12:57.9252285Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1661s] [100%]
2025-12-04T12:12:57.9253105Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1613s] [100%]
2025-12-04T12:12:57.9253113Z 
2025-12-04T12:12:57.9253264Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9253821Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9253957Z Traceback (most recent call last):
2025-12-04T12:12:57.9254419Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9254616Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9254843Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9254847Z 
2025-12-04T12:12:57.9255055Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9255997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9256002Z 
2025-12-04T12:12:57.9256262Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9256474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9256599Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9256742Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9257076Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9257337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9257436Z graph_break []
2025-12-04T12:12:57.9257658Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9258379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9258482Z   warnings.warn(
2025-12-04T12:12:57.9259054Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9259173Z Traceback (most recent call last):
2025-12-04T12:12:57.9259682Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9259877Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9260083Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9260088Z 
2025-12-04T12:12:57.9260316Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9261248Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9261283Z 
2025-12-04T12:12:57.9261556Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9261767Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9261878Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9262009Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9262342Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9262558Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9262675Z graph_break []
2025-12-04T12:12:57.9262886Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9263616Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9263717Z   warnings.warn(
2025-12-04T12:12:57.9263925Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9264049Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9264162Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9264372Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9264715Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9264813Z graph_break []
2025-12-04T12:12:57.9265038Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9265751Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9265850Z   warnings.warn(
2025-12-04T12:12:57.9266004Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9266562Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9266679Z Traceback (most recent call last):
2025-12-04T12:12:57.9267154Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9267346Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9267569Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9267617Z 
2025-12-04T12:12:57.9267829Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9268793Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9268811Z 
2025-12-04T12:12:57.9269067Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9269281Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9269402Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9269513Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9269841Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9270063Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9270185Z graph_break []
2025-12-04T12:12:57.9270399Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9271127Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9271223Z   warnings.warn(
2025-12-04T12:12:57.9271442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9271580Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9271692Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9271915Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9272242Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9272336Z graph_break []
2025-12-04T12:12:57.9272557Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9273270Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9273384Z   warnings.warn(
2025-12-04T12:12:57.9273591Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9273700Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9273823Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9274035Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9274364Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9274473Z graph_break []
2025-12-04T12:12:57.9274680Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9275404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9275503Z   warnings.warn(
2025-12-04T12:12:57.9276301Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml -
2025-12-04T12:12:57.9276485Z =========================== short test summary info ============================
2025-12-04T12:12:57.9277543Z FAILED [0.1613s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9277551Z 
2025-12-04T12:12:57.9277774Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9278706Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9278714Z 
2025-12-04T12:12:57.9279020Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9279197Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9279421Z ================== 1 failed, 174 deselected, 2 rerun in 5.03s ==================
2025-12-04T12:12:57.9279532Z Got exit code 1
2025-12-04T12:12:57.9279633Z Retrying single test...
2025-12-04T12:12:57.9280255Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml
2025-12-04T12:12:57.9280426Z ============================= test session starts ==============================
2025-12-04T12:12:57.9280767Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9280886Z cachedir: .pytest_cache
2025-12-04T12:12:57.9281421Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9281545Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9281662Z configfile: pytest.ini
2025-12-04T12:12:57.9282312Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9282537Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9283573Z stepcurrent: skipping 148 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9283730Z Running 1 items in this shard
2025-12-04T12:12:57.9283735Z 
2025-12-04T12:12:57.9284655Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5530s] [100%]
2025-12-04T12:12:57.9285551Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1618s] [100%]
2025-12-04T12:12:57.9286387Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1573s] [100%]
2025-12-04T12:12:57.9286393Z 
2025-12-04T12:12:57.9286534Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9287088Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9287223Z Traceback (most recent call last):
2025-12-04T12:12:57.9287685Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9287895Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9288105Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9288112Z 
2025-12-04T12:12:57.9288322Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9289268Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9289276Z 
2025-12-04T12:12:57.9289536Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9289764Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9289876Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9289989Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9290334Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9290584Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9290679Z graph_break []
2025-12-04T12:12:57.9290902Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9291652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9291763Z   warnings.warn(
2025-12-04T12:12:57.9292326Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9292446Z Traceback (most recent call last):
2025-12-04T12:12:57.9292918Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9293109Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9293357Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9293376Z 
2025-12-04T12:12:57.9293586Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9294521Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9294528Z 
2025-12-04T12:12:57.9294799Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9295115Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9295239Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9295351Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9295764Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9295996Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9296092Z graph_break []
2025-12-04T12:12:57.9296308Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9297038Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9297138Z   warnings.warn(
2025-12-04T12:12:57.9297363Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9297470Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9297583Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9297812Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9298139Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9298234Z graph_break []
2025-12-04T12:12:57.9298455Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9299170Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9299279Z   warnings.warn(
2025-12-04T12:12:57.9299420Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9299978Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9300111Z Traceback (most recent call last):
2025-12-04T12:12:57.9300569Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9300762Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9301148Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9301154Z 
2025-12-04T12:12:57.9301361Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9302386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9302430Z 
2025-12-04T12:12:57.9302688Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9302898Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9303020Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9303130Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9303473Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9303684Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9303777Z graph_break []
2025-12-04T12:12:57.9304001Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9304764Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9304864Z   warnings.warn(
2025-12-04T12:12:57.9305088Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9305198Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9305323Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9305538Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9305873Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9306022Z graph_break []
2025-12-04T12:12:57.9306231Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9306945Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9307056Z   warnings.warn(
2025-12-04T12:12:57.9307265Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9307386Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9307499Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9307713Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9308052Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9308147Z graph_break []
2025-12-04T12:12:57.9308351Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9309073Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9309170Z   warnings.warn(
2025-12-04T12:12:57.9309982Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml -
2025-12-04T12:12:57.9310151Z =========================== short test summary info ============================
2025-12-04T12:12:57.9311207Z FAILED [0.1573s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9311214Z 
2025-12-04T12:12:57.9311438Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9312374Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9312379Z 
2025-12-04T12:12:57.9312650Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9312824Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9313062Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:57.9313182Z Got exit code 1
2025-12-04T12:12:57.9314067Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9314479Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9315109Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml
2025-12-04T12:12:57.9316081Z ============================= test session starts ==============================
2025-12-04T12:12:57.9316740Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9317320Z cachedir: .pytest_cache
2025-12-04T12:12:57.9318055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9318833Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9319168Z configfile: pytest.ini
2025-12-04T12:12:57.9319932Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9320884Z collecting ... collected 380 items / 149 deselected / 231 selected
2025-12-04T12:12:57.9321422Z stepcurrent: skipping 149 already run items.
2025-12-04T12:12:57.9321802Z Running 26 items in this shard
2025-12-04T12:12:57.9322022Z 
2025-12-04T12:12:57.9323122Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  3%]
2025-12-04T12:12:57.9325351Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [  7%]
2025-12-04T12:12:57.9327513Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0036s] (Skip non-critical tests to save resources.) [ 11%]
2025-12-04T12:12:57.9329653Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0026s] (Skip non-critical tests to save resources.) [ 15%]
2025-12-04T12:12:57.9331683Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5405s] [ 19%]
2025-12-04T12:12:57.9333622Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1566s] [ 19%]
2025-12-04T12:12:57.9335474Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1564s] [ 19%]
2025-12-04T12:12:57.9336440Z 
2025-12-04T12:12:57.9336579Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9337425Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9338228Z Traceback (most recent call last):
2025-12-04T12:12:57.9338931Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9339721Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9340249Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9340633Z 
2025-12-04T12:12:57.9340844Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9342154Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9343214Z 
2025-12-04T12:12:57.9343488Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9344112Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9344565Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9344897Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9345443Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9346118Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9346598Z graph_break []
2025-12-04T12:12:57.9346970Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9348035Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9348988Z   warnings.warn(
2025-12-04T12:12:57.9349711Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9350554Z Traceback (most recent call last):
2025-12-04T12:12:57.9351234Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9352027Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9352559Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9352892Z 
2025-12-04T12:12:57.9353119Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9354386Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9355459Z 
2025-12-04T12:12:57.9355718Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9356335Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9356799Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9357115Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9357655Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9358343Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9358777Z graph_break []
2025-12-04T12:12:57.9359141Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9360217Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9361172Z   warnings.warn(
2025-12-04T12:12:57.9361533Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9361999Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9362402Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9362821Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9363511Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9364077Z graph_break []
2025-12-04T12:12:57.9364431Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9365504Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9366487Z   warnings.warn(
2025-12-04T12:12:57.9366791Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9367670Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9368483Z Traceback (most recent call last):
2025-12-04T12:12:57.9369179Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9376308Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9376873Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9377216Z 
2025-12-04T12:12:57.9377432Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9378826Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9379909Z 
2025-12-04T12:12:57.9380181Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9380808Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9381269Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9381608Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9382159Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9382899Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9383342Z graph_break []
2025-12-04T12:12:57.9383711Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9384795Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9385734Z   warnings.warn(
2025-12-04T12:12:57.9386118Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9386580Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9386894Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9387319Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9388004Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9388574Z graph_break []
2025-12-04T12:12:57.9388926Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9389997Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9390947Z   warnings.warn(
2025-12-04T12:12:57.9391306Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9391777Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9392109Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9392542Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9393211Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9393779Z graph_break []
2025-12-04T12:12:57.9394139Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9395193Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9396135Z   warnings.warn(
2025-12-04T12:12:57.9397092Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml -
2025-12-04T12:12:57.9398195Z =========================== short test summary info ============================
2025-12-04T12:12:57.9399615Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9401094Z 
2025-12-04T12:12:57.9401316Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9402713Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9403782Z 
2025-12-04T12:12:57.9404063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9404649Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9405241Z ============ 1 failed, 4 skipped, 149 deselected, 2 rerun in 4.92s =============
2025-12-04T12:12:57.9405711Z Got exit code 1
2025-12-04T12:12:57.9405982Z Retrying single test...
2025-12-04T12:12:57.9406783Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml
2025-12-04T12:12:57.9407722Z ============================= test session starts ==============================
2025-12-04T12:12:57.9408375Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9409009Z cachedir: .pytest_cache
2025-12-04T12:12:57.9409693Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9410460Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9410813Z configfile: pytest.ini
2025-12-04T12:12:57.9411560Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9412496Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9413873Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9415142Z Running 1 items in this shard
2025-12-04T12:12:57.9415348Z 
2025-12-04T12:12:57.9416255Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5673s] [100%]
2025-12-04T12:12:57.9418190Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [100%]
2025-12-04T12:12:57.9420046Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1578s] [100%]
2025-12-04T12:12:57.9421011Z 
2025-12-04T12:12:57.9421149Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9421989Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9422790Z Traceback (most recent call last):
2025-12-04T12:12:57.9423490Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9424279Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9424813Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9425145Z 
2025-12-04T12:12:57.9425358Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9426692Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9427805Z 
2025-12-04T12:12:57.9428078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9428695Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9429147Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9429483Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9430032Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9430716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9431163Z graph_break []
2025-12-04T12:12:57.9431527Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9432642Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9433584Z   warnings.warn(
2025-12-04T12:12:57.9434309Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9435125Z Traceback (most recent call last):
2025-12-04T12:12:57.9435805Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9436633Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9437175Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9437506Z 
2025-12-04T12:12:57.9437730Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9439003Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9440076Z 
2025-12-04T12:12:57.9440339Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9440953Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9441422Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9441737Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9442350Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9443052Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9443494Z graph_break []
2025-12-04T12:12:57.9443867Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9444941Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9445907Z   warnings.warn(
2025-12-04T12:12:57.9446267Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9446732Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9447062Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9447480Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9448169Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9448736Z graph_break []
2025-12-04T12:12:57.9449105Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9450163Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9451108Z   warnings.warn(
2025-12-04T12:12:57.9451416Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9452296Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9453118Z Traceback (most recent call last):
2025-12-04T12:12:57.9453847Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9454642Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9455170Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9455517Z 
2025-12-04T12:12:57.9455727Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9457002Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9458062Z 
2025-12-04T12:12:57.9458336Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9458971Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9459434Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9459769Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9460298Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9460994Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9461440Z graph_break []
2025-12-04T12:12:57.9461842Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9462899Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9463846Z   warnings.warn(
2025-12-04T12:12:57.9464221Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9464669Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9465002Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9465435Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9466113Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9466671Z graph_break []
2025-12-04T12:12:57.9467032Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9468094Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9469031Z   warnings.warn(
2025-12-04T12:12:57.9469401Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9469862Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9470189Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9470607Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9471294Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9471861Z graph_break []
2025-12-04T12:12:57.9472212Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9473271Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9474220Z   warnings.warn(
2025-12-04T12:12:57.9475182Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml -
2025-12-04T12:12:57.9476265Z =========================== short test summary info ============================
2025-12-04T12:12:57.9477698Z FAILED [0.1578s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9478914Z 
2025-12-04T12:12:57.9479129Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9480446Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9481511Z 
2025-12-04T12:12:57.9481791Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9482435Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9482951Z ================== 1 failed, 174 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:57.9483391Z Got exit code 1
2025-12-04T12:12:57.9483644Z Retrying single test...
2025-12-04T12:12:57.9484495Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml
2025-12-04T12:12:57.9485422Z ============================= test session starts ==============================
2025-12-04T12:12:57.9486069Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9486646Z cachedir: .pytest_cache
2025-12-04T12:12:57.9487335Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9488135Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9488466Z configfile: pytest.ini
2025-12-04T12:12:57.9489231Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9490160Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9491546Z stepcurrent: skipping 153 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9492800Z Running 1 items in this shard
2025-12-04T12:12:57.9493012Z 
2025-12-04T12:12:57.9493918Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5346s] [100%]
2025-12-04T12:12:57.9495851Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1571s] [100%]
2025-12-04T12:12:57.9497697Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1564s] [100%]
2025-12-04T12:12:57.9498639Z 
2025-12-04T12:12:57.9498795Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9499623Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9500437Z Traceback (most recent call last):
2025-12-04T12:12:57.9501421Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9502220Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9502754Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9503103Z 
2025-12-04T12:12:57.9503313Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9504591Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9505652Z 
2025-12-04T12:12:57.9506014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9506626Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9507137Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9507466Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9508002Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9508689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9509144Z graph_break []
2025-12-04T12:12:57.9509496Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9510568Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9511516Z   warnings.warn(
2025-12-04T12:12:57.9512275Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9513080Z Traceback (most recent call last):
2025-12-04T12:12:57.9513777Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9514565Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9515104Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9515438Z 
2025-12-04T12:12:57.9515690Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9516964Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9518040Z 
2025-12-04T12:12:57.9518303Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9518919Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9519374Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9519701Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9520255Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9520937Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9521371Z graph_break []
2025-12-04T12:12:57.9521738Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9522873Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9523809Z   warnings.warn(
2025-12-04T12:12:57.9524182Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9524643Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9524972Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9525382Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9526064Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9526634Z graph_break []
2025-12-04T12:12:57.9526981Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9528039Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9528983Z   warnings.warn(
2025-12-04T12:12:57.9529283Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9530112Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:57.9530925Z Traceback (most recent call last):
2025-12-04T12:12:57.9531660Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9532438Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9533007Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9533349Z 
2025-12-04T12:12:57.9533556Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9534832Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9535899Z 
2025-12-04T12:12:57.9536158Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9536763Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9537221Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9537547Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9538111Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9538796Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9539240Z graph_break []
2025-12-04T12:12:57.9539588Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9540651Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9541632Z   warnings.warn(
2025-12-04T12:12:57.9542001Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9542445Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9542765Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9543191Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9543866Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9544429Z graph_break []
2025-12-04T12:12:57.9544782Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9545843Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9546770Z   warnings.warn(
2025-12-04T12:12:57.9547136Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9547594Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9547913Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9548337Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9549015Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9549575Z graph_break []
2025-12-04T12:12:57.9549925Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9551000Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9551943Z   warnings.warn(
2025-12-04T12:12:57.9552888Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml -
2025-12-04T12:12:57.9553989Z =========================== short test summary info ============================
2025-12-04T12:12:57.9555366Z FAILED [0.1564s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9556560Z 
2025-12-04T12:12:57.9556782Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9558098Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9559189Z 
2025-12-04T12:12:57.9559446Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9560026Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9560536Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.9560956Z Got exit code 1
2025-12-04T12:12:57.9561968Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:57.9563426Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9564634Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml
2025-12-04T12:12:57.9565537Z ============================= test session starts ==============================
2025-12-04T12:12:57.9566186Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9566771Z cachedir: .pytest_cache
2025-12-04T12:12:57.9567460Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9568257Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9568598Z configfile: pytest.ini
2025-12-04T12:12:57.9569352Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9570283Z collecting ... collected 380 items / 154 deselected / 226 selected
2025-12-04T12:12:57.9570768Z stepcurrent: skipping 154 already run items.
2025-12-04T12:12:57.9571148Z Running 21 items in this shard
2025-12-04T12:12:57.9571354Z 
2025-12-04T12:12:57.9572382Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0039s] (Skip non-critical tests to save resources.) [  4%]
2025-12-04T12:12:57.9574441Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5479s] [  9%]
2025-12-04T12:12:57.9576349Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1578s] [  9%]
2025-12-04T12:12:57.9578201Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [  9%]
2025-12-04T12:12:57.9579171Z 
2025-12-04T12:12:57.9579307Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9580146Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9580945Z Traceback (most recent call last):
2025-12-04T12:12:57.9581636Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9582427Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9582956Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9583283Z 
2025-12-04T12:12:57.9583488Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9584802Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9585869Z 
2025-12-04T12:12:57.9586125Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9586766Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9587216Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9587535Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9588079Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9588752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9589195Z graph_break []
2025-12-04T12:12:57.9589556Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9590663Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9591597Z   warnings.warn(
2025-12-04T12:12:57.9592323Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9593134Z Traceback (most recent call last):
2025-12-04T12:12:57.9593830Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9594609Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9595179Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9595507Z 
2025-12-04T12:12:57.9595732Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9596997Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9598068Z 
2025-12-04T12:12:57.9598329Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9598941Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9599409Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9599724Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9600263Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9601116Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9601570Z graph_break []
2025-12-04T12:12:57.9601921Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9603053Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9604002Z   warnings.warn(
2025-12-04T12:12:57.9604367Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9604824Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9605154Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9605569Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9606251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9606804Z graph_break []
2025-12-04T12:12:57.9607160Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9608210Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9609154Z   warnings.warn(
2025-12-04T12:12:57.9609453Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9610297Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9611169Z Traceback (most recent call last):
2025-12-04T12:12:57.9611868Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9612693Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9613208Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9613555Z 
2025-12-04T12:12:57.9613763Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9615038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9616093Z 
2025-12-04T12:12:57.9616363Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9616995Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9617461Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9617781Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9618317Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9618984Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9619426Z graph_break []
2025-12-04T12:12:57.9619786Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9620880Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9621822Z   warnings.warn(
2025-12-04T12:12:57.9622185Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9622644Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9622962Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9623389Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9624069Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9624623Z graph_break []
2025-12-04T12:12:57.9624982Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9626045Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9626983Z   warnings.warn(
2025-12-04T12:12:57.9627343Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9627802Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9628127Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9628537Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9629210Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9629772Z graph_break []
2025-12-04T12:12:57.9630120Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9631182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9632122Z   warnings.warn(
2025-12-04T12:12:57.9633076Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml -
2025-12-04T12:12:57.9634164Z =========================== short test summary info ============================
2025-12-04T12:12:57.9635538Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9636747Z 
2025-12-04T12:12:57.9636997Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9638276Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9639364Z 
2025-12-04T12:12:57.9639637Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9640195Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9640722Z ============ 1 failed, 1 skipped, 154 deselected, 2 rerun in 4.92s =============
2025-12-04T12:12:57.9641174Z Got exit code 1
2025-12-04T12:12:57.9641428Z Retrying single test...
2025-12-04T12:12:57.9642305Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml
2025-12-04T12:12:57.9643271Z ============================= test session starts ==============================
2025-12-04T12:12:57.9643921Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9644496Z cachedir: .pytest_cache
2025-12-04T12:12:57.9645177Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9645942Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9646284Z configfile: pytest.ini
2025-12-04T12:12:57.9647110Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9648046Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9649424Z stepcurrent: skipping 155 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9650685Z Running 1 items in this shard
2025-12-04T12:12:57.9650901Z 
2025-12-04T12:12:57.9651800Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5961s] [100%]
2025-12-04T12:12:57.9653734Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1653s] [100%]
2025-12-04T12:12:57.9655578Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1616s] [100%]
2025-12-04T12:12:57.9656528Z 
2025-12-04T12:12:57.9656675Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9657512Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9658322Z Traceback (most recent call last):
2025-12-04T12:12:57.9659026Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9659820Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9660350Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9660696Z 
2025-12-04T12:12:57.9660910Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9662203Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9663268Z 
2025-12-04T12:12:57.9663547Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9664201Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9664669Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9665035Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9665570Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9666261Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9666721Z graph_break []
2025-12-04T12:12:57.9667092Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9668157Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9669115Z   warnings.warn(
2025-12-04T12:12:57.9669833Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9670707Z Traceback (most recent call last):
2025-12-04T12:12:57.9671394Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9672187Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9672724Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9673058Z 
2025-12-04T12:12:57.9673270Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9674719Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9675789Z 
2025-12-04T12:12:57.9676050Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9676668Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9677122Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9677455Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9678000Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9678689Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9679124Z graph_break []
2025-12-04T12:12:57.9679489Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9680559Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9681496Z   warnings.warn(
2025-12-04T12:12:57.9681870Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9682404Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9682730Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9683146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9683831Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9684400Z graph_break []
2025-12-04T12:12:57.9684750Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9685819Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9686770Z   warnings.warn(
2025-12-04T12:12:57.9687071Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9687909Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9688731Z Traceback (most recent call last):
2025-12-04T12:12:57.9689425Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9690273Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9690820Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9691202Z 
2025-12-04T12:12:57.9691415Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9692702Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9693769Z 
2025-12-04T12:12:57.9694027Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9694639Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9695102Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9695438Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9696094Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9696791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9697247Z graph_break []
2025-12-04T12:12:57.9697604Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9698684Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9699641Z   warnings.warn(
2025-12-04T12:12:57.9700057Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9700510Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9700998Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9701437Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9701768Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9701863Z graph_break []
2025-12-04T12:12:57.9702090Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9702802Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9702917Z   warnings.warn(
2025-12-04T12:12:57.9703131Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9703237Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9703367Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9703582Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9703911Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9704019Z graph_break []
2025-12-04T12:12:57.9704225Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9704949Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9705047Z   warnings.warn(
2025-12-04T12:12:57.9705848Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml -
2025-12-04T12:12:57.9706024Z =========================== short test summary info ============================
2025-12-04T12:12:57.9707090Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9707099Z 
2025-12-04T12:12:57.9707320Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9708325Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9708331Z 
2025-12-04T12:12:57.9708631Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9708820Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9709014Z ================== 1 failed, 174 deselected, 2 rerun in 4.98s ==================
2025-12-04T12:12:57.9709123Z Got exit code 1
2025-12-04T12:12:57.9709233Z Retrying single test...
2025-12-04T12:12:57.9709855Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml
2025-12-04T12:12:57.9710030Z ============================= test session starts ==============================
2025-12-04T12:12:57.9710370Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9710513Z cachedir: .pytest_cache
2025-12-04T12:12:57.9711037Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9711158Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9711275Z configfile: pytest.ini
2025-12-04T12:12:57.9711849Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9712071Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9713146Z stepcurrent: skipping 155 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9713260Z Running 1 items in this shard
2025-12-04T12:12:57.9713265Z 
2025-12-04T12:12:57.9714184Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5367s] [100%]
2025-12-04T12:12:57.9715083Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1580s] [100%]
2025-12-04T12:12:57.9715917Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1560s] [100%]
2025-12-04T12:12:57.9715925Z 
2025-12-04T12:12:57.9716062Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9716620Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9716752Z Traceback (most recent call last):
2025-12-04T12:12:57.9717216Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9717424Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9717632Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9717637Z 
2025-12-04T12:12:57.9717846Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9718797Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9718804Z 
2025-12-04T12:12:57.9719063Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9719287Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9719399Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9719512Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9719905Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9720122Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9720247Z graph_break []
2025-12-04T12:12:57.9720471Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9721190Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9721303Z   warnings.warn(
2025-12-04T12:12:57.9721863Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9721981Z Traceback (most recent call last):
2025-12-04T12:12:57.9722522Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9722757Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9722971Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9722993Z 
2025-12-04T12:12:57.9723201Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9724139Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9724178Z 
2025-12-04T12:12:57.9724453Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9724672Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9724782Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9724913Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9725248Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9725478Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9725575Z graph_break []
2025-12-04T12:12:57.9725784Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9726515Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9726616Z   warnings.warn(
2025-12-04T12:12:57.9726826Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9726951Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9727066Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9727293Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9727622Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9727718Z graph_break []
2025-12-04T12:12:57.9727942Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9728652Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9728757Z   warnings.warn(
2025-12-04T12:12:57.9728913Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9729475Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:57.9729615Z Traceback (most recent call last):
2025-12-04T12:12:57.9730078Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9730270Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9730496Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9730501Z 
2025-12-04T12:12:57.9730714Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9731709Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9731764Z 
2025-12-04T12:12:57.9732025Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9732237Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9732364Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9732479Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9732823Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9733037Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9733139Z graph_break []
2025-12-04T12:12:57.9733398Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9734116Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9734218Z   warnings.warn(
2025-12-04T12:12:57.9734442Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9734552Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9734665Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9734927Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9735253Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9735362Z graph_break []
2025-12-04T12:12:57.9735566Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9736280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9736391Z   warnings.warn(
2025-12-04T12:12:57.9736597Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9736717Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9736827Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9737040Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9737379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9737476Z graph_break []
2025-12-04T12:12:57.9737682Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9738400Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9738494Z   warnings.warn(
2025-12-04T12:12:57.9739306Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml -
2025-12-04T12:12:57.9739477Z =========================== short test summary info ============================
2025-12-04T12:12:57.9740534Z FAILED [0.1560s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9740542Z 
2025-12-04T12:12:57.9740764Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9741697Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9741703Z 
2025-12-04T12:12:57.9741976Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9742184Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9742379Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:57.9742525Z Got exit code 1
2025-12-04T12:12:57.9743374Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:57.9743788Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9744410Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml
2025-12-04T12:12:57.9744570Z ============================= test session starts ==============================
2025-12-04T12:12:57.9744958Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9745069Z cachedir: .pytest_cache
2025-12-04T12:12:57.9745588Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9745711Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9745818Z configfile: pytest.ini
2025-12-04T12:12:57.9746404Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9746658Z collecting ... collected 380 items / 156 deselected / 224 selected
2025-12-04T12:12:57.9746800Z stepcurrent: skipping 156 already run items.
2025-12-04T12:12:57.9746923Z Running 19 items in this shard
2025-12-04T12:12:57.9746928Z 
2025-12-04T12:12:57.9747948Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0041s] (Skip non-critical tests to save resources.) [  5%]
2025-12-04T12:12:57.9748972Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_False SKIPPED [0.0031s] (Skip non-critical tests to save resources.) [ 10%]
2025-12-04T12:12:57.9749966Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_True_initial_xblock_2_add_1dim_True SKIPPED [0.0038s] (Skip non-critical tests to save resources.) [ 15%]
2025-12-04T12:12:57.9750864Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.6001s] [ 21%]
2025-12-04T12:12:57.9751751Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1675s] [ 21%]
2025-12-04T12:12:57.9752558Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1639s] [ 21%]
2025-12-04T12:12:57.9752580Z 
2025-12-04T12:12:57.9752715Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9753261Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9753392Z Traceback (most recent call last):
2025-12-04T12:12:57.9753852Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9754047Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9754263Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9754268Z 
2025-12-04T12:12:57.9754478Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9755467Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9755502Z 
2025-12-04T12:12:57.9755763Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9755989Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9756101Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9756215Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9756558Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9756770Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9756866Z graph_break []
2025-12-04T12:12:57.9757120Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9759784Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9759946Z   return x.grad, w.grad
2025-12-04T12:12:57.9760666Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9760779Z   warnings.warn(
2025-12-04T12:12:57.9763736Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9763851Z   return x.grad, w.grad
2025-12-04T12:12:57.9764435Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9764561Z Traceback (most recent call last):
2025-12-04T12:12:57.9765051Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9765257Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9765473Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9765478Z 
2025-12-04T12:12:57.9765716Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9766668Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9766674Z 
2025-12-04T12:12:57.9766959Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9767178Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9767290Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9767419Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9767763Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9767998Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9768162Z graph_break []
2025-12-04T12:12:57.9768380Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9771178Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9771291Z   return x.grad, w.grad
2025-12-04T12:12:57.9772085Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9772190Z   warnings.warn(
2025-12-04T12:12:57.9774931Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9775077Z   return x.grad, w.grad
2025-12-04T12:12:57.9775418Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9775539Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9775654Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9775877Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9776219Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9776317Z graph_break []
2025-12-04T12:12:57.9776535Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9779174Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9779295Z   return x.grad, w.grad
2025-12-04T12:12:57.9780009Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9780110Z   warnings.warn(
2025-12-04T12:12:57.9782748Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9782854Z   return x.grad, w.grad
2025-12-04T12:12:57.9783007Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9783603Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9783750Z Traceback (most recent call last):
2025-12-04T12:12:57.9784218Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9784410Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9784631Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9784636Z 
2025-12-04T12:12:57.9784844Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9785768Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9785786Z 
2025-12-04T12:12:57.9786080Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9786294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9786418Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9786532Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9786863Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9787088Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9787223Z graph_break []
2025-12-04T12:12:57.9787436Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9790103Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9790214Z   return x.grad, w.grad
2025-12-04T12:12:57.9790937Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9791039Z   warnings.warn(
2025-12-04T12:12:57.9793691Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9793795Z   return x.grad, w.grad
2025-12-04T12:12:57.9794022Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9794130Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9794242Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9794467Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9794799Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9794892Z graph_break []
2025-12-04T12:12:57.9795112Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9797785Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9797928Z   return x.grad, w.grad
2025-12-04T12:12:57.9798639Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9798750Z   warnings.warn(
2025-12-04T12:12:57.9801658Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9801782Z   return x.grad, w.grad
2025-12-04T12:12:57.9801996Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9802201Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9802329Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9802551Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9802885Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9803010Z graph_break []
2025-12-04T12:12:57.9803226Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9803957Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9804062Z   warnings.warn(
2025-12-04T12:12:57.9806693Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9806816Z   return x.grad, w.grad
2025-12-04T12:12:57.9807616Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml -
2025-12-04T12:12:57.9807803Z =========================== short test summary info ============================
2025-12-04T12:12:57.9808871Z FAILED [0.1639s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9808879Z 
2025-12-04T12:12:57.9809103Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9810031Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9810037Z 
2025-12-04T12:12:57.9810296Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9810541Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9810760Z ============ 1 failed, 3 skipped, 156 deselected, 2 rerun in 5.00s =============
2025-12-04T12:12:57.9810917Z Got exit code 1
2025-12-04T12:12:57.9811023Z Retrying single test...
2025-12-04T12:12:57.9811648Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml
2025-12-04T12:12:57.9811823Z ============================= test session starts ==============================
2025-12-04T12:12:57.9812166Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9812273Z cachedir: .pytest_cache
2025-12-04T12:12:57.9812795Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9812915Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9813085Z configfile: pytest.ini
2025-12-04T12:12:57.9813663Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9813888Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9814914Z stepcurrent: skipping 159 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9815062Z Running 1 items in this shard
2025-12-04T12:12:57.9815067Z 
2025-12-04T12:12:57.9815970Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5689s] [100%]
2025-12-04T12:12:57.9816863Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1644s] [100%]
2025-12-04T12:12:57.9817685Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1624s] [100%]
2025-12-04T12:12:57.9817693Z 
2025-12-04T12:12:57.9817829Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9818378Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9818513Z Traceback (most recent call last):
2025-12-04T12:12:57.9818972Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9819184Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9819388Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9819395Z 
2025-12-04T12:12:57.9819605Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9820544Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9820552Z 
2025-12-04T12:12:57.9820811Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9821041Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9821155Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9821267Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9821610Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9821824Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9821919Z graph_break []
2025-12-04T12:12:57.9822177Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9824822Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9824971Z   return x.grad, w.grad
2025-12-04T12:12:57.9825688Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9825798Z   warnings.warn(
2025-12-04T12:12:57.9828448Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9828608Z   return x.grad, w.grad
2025-12-04T12:12:57.9829151Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9829269Z Traceback (most recent call last):
2025-12-04T12:12:57.9829737Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9829932Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9830138Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9830145Z 
2025-12-04T12:12:57.9830364Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9831287Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9831295Z 
2025-12-04T12:12:57.9831567Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9831777Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9831887Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9832013Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9832346Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9832576Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9832671Z graph_break []
2025-12-04T12:12:57.9832881Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9835532Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9835637Z   return x.grad, w.grad
2025-12-04T12:12:57.9836404Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9836530Z   warnings.warn(
2025-12-04T12:12:57.9839168Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9839273Z   return x.grad, w.grad
2025-12-04T12:12:57.9839488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9839644Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9839764Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9839991Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9840326Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9840421Z graph_break []
2025-12-04T12:12:57.9840642Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9843347Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9843507Z   return x.grad, w.grad
2025-12-04T12:12:57.9844219Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9844320Z   warnings.warn(
2025-12-04T12:12:57.9846974Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9847081Z   return x.grad, w.grad
2025-12-04T12:12:57.9847238Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9847787Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9847924Z Traceback (most recent call last):
2025-12-04T12:12:57.9848381Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9848576Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9848803Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9848808Z 
2025-12-04T12:12:57.9849017Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9849942Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9849961Z 
2025-12-04T12:12:57.9850254Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9850493Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9850617Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9850729Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9851058Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9851285Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9851380Z graph_break []
2025-12-04T12:12:57.9851605Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9854274Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9854393Z   return x.grad, w.grad
2025-12-04T12:12:57.9855107Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9855239Z   warnings.warn(
2025-12-04T12:12:57.9857887Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9857992Z   return x.grad, w.grad
2025-12-04T12:12:57.9858215Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9858324Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9858435Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9858667Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9858998Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9859093Z graph_break []
2025-12-04T12:12:57.9859315Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9861956Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9862076Z   return x.grad, w.grad
2025-12-04T12:12:57.9862787Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9862898Z   warnings.warn(
2025-12-04T12:12:57.9865562Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9865710Z   return x.grad, w.grad
2025-12-04T12:12:57.9865921Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9866030Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9866153Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9866370Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9866698Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9866805Z graph_break []
2025-12-04T12:12:57.9867050Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9867780Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9867880Z   warnings.warn(
2025-12-04T12:12:57.9870509Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9870655Z   return x.grad, w.grad
2025-12-04T12:12:57.9871456Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml -
2025-12-04T12:12:57.9871638Z =========================== short test summary info ============================
2025-12-04T12:12:57.9872686Z FAILED [0.1624s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9872693Z 
2025-12-04T12:12:57.9872916Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9873837Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9873842Z 
2025-12-04T12:12:57.9874112Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9874290Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9874486Z ================== 1 failed, 174 deselected, 2 rerun in 4.95s ==================
2025-12-04T12:12:57.9874595Z Got exit code 1
2025-12-04T12:12:57.9874699Z Retrying single test...
2025-12-04T12:12:57.9875323Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml
2025-12-04T12:12:57.9875495Z ============================= test session starts ==============================
2025-12-04T12:12:57.9875837Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9875957Z cachedir: .pytest_cache
2025-12-04T12:12:57.9876466Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9876587Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9876751Z configfile: pytest.ini
2025-12-04T12:12:57.9877331Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9877585Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:57.9878603Z stepcurrent: skipping 159 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9878718Z Running 1 items in this shard
2025-12-04T12:12:57.9878723Z 
2025-12-04T12:12:57.9879615Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [4.5787s] [100%]
2025-12-04T12:12:57.9880549Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True ('RERUN', {'yellow': True}) [0.1654s] [100%]
2025-12-04T12:12:57.9881372Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True FAILED [0.1650s] [100%]
2025-12-04T12:12:57.9881377Z 
2025-12-04T12:12:57.9881519Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9882101Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9882296Z Traceback (most recent call last):
2025-12-04T12:12:57.9882755Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9882963Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9883171Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9883178Z 
2025-12-04T12:12:57.9883386Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9884322Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9884328Z 
2025-12-04T12:12:57.9884587Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9884815Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9884923Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9885034Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9885379Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9885595Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9885693Z graph_break []
2025-12-04T12:12:57.9885916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9888573Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9888693Z   return x.grad, w.grad
2025-12-04T12:12:57.9889409Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9889520Z   warnings.warn(
2025-12-04T12:12:57.9892201Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9892349Z   return x.grad, w.grad
2025-12-04T12:12:57.9892900Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9893020Z Traceback (most recent call last):
2025-12-04T12:12:57.9893524Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9893722Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9893935Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9893955Z 
2025-12-04T12:12:57.9894167Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9895093Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9895134Z 
2025-12-04T12:12:57.9895408Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9895621Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9895730Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9895858Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9896194Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9896425Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9896525Z graph_break []
2025-12-04T12:12:57.9896734Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9899390Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9899500Z   return x.grad, w.grad
2025-12-04T12:12:57.9900234Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9900336Z   warnings.warn(
2025-12-04T12:12:57.9903185Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9903294Z   return x.grad, w.grad
2025-12-04T12:12:57.9903510Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9903707Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9903823Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9904178Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9904512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9904609Z graph_break []
2025-12-04T12:12:57.9904834Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9907517Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9907640Z   return x.grad, w.grad
2025-12-04T12:12:57.9908359Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9908460Z   warnings.warn(
2025-12-04T12:12:57.9911109Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9911262Z   return x.grad, w.grad
2025-12-04T12:12:57.9911422Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9911975Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True _
2025-12-04T12:12:57.9912109Z Traceback (most recent call last):
2025-12-04T12:12:57.9912566Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9912762Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9912989Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9912994Z 
2025-12-04T12:12:57.9913205Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9914152Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9914158Z 
2025-12-04T12:12:57.9914416Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9914631Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9914756Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9914869Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9915201Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9915429Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9915523Z graph_break []
2025-12-04T12:12:57.9915744Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9918425Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9918570Z   return x.grad, w.grad
2025-12-04T12:12:57.9919280Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9919379Z   warnings.warn(
2025-12-04T12:12:57.9922069Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9922227Z   return x.grad, w.grad
2025-12-04T12:12:57.9922455Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9922596Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9922708Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9922941Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9923274Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9923388Z graph_break []
2025-12-04T12:12:57.9923601Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9926238Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9926361Z   return x.grad, w.grad
2025-12-04T12:12:57.9927080Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9927192Z   warnings.warn(
2025-12-04T12:12:57.9929831Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9929953Z   return x.grad, w.grad
2025-12-04T12:12:57.9930166Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9930277Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9930405Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9930625Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9930965Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9931061Z graph_break []
2025-12-04T12:12:57.9931358Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9932136Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9932234Z   warnings.warn(
2025-12-04T12:12:57.9934861Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9934999Z   return x.grad, w.grad
2025-12-04T12:12:57.9935808Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml -
2025-12-04T12:12:57.9935992Z =========================== short test summary info ============================
2025-12-04T12:12:57.9937044Z FAILED [0.1650s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9937081Z 
2025-12-04T12:12:57.9937304Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9938231Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9938238Z 
2025-12-04T12:12:57.9938513Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9938687Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:57.9938885Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ==================
2025-12-04T12:12:57.9938992Z Got exit code 1
2025-12-04T12:12:57.9939829Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True
2025-12-04T12:12:57.9940229Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:57.9940869Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml
2025-12-04T12:12:57.9941028Z ============================= test session starts ==============================
2025-12-04T12:12:57.9941383Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:57.9941487Z cachedir: .pytest_cache
2025-12-04T12:12:57.9941995Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:57.9942127Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:57.9942234Z configfile: pytest.ini
2025-12-04T12:12:57.9942823Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:57.9943046Z collecting ... collected 380 items / 160 deselected / 220 selected
2025-12-04T12:12:57.9943189Z stepcurrent: skipping 160 already run items.
2025-12-04T12:12:57.9943318Z Running 15 items in this shard
2025-12-04T12:12:57.9943323Z 
2025-12-04T12:12:57.9944258Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5536s] [  6%]
2025-12-04T12:12:57.9945161Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1655s] [  6%]
2025-12-04T12:12:57.9946001Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1635s] [  6%]
2025-12-04T12:12:57.9946009Z 
2025-12-04T12:12:57.9946145Z ==================================== RERUNS ====================================
2025-12-04T12:12:57.9946706Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.9946824Z Traceback (most recent call last):
2025-12-04T12:12:57.9947328Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9947524Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9947732Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9947737Z 
2025-12-04T12:12:57.9947959Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9948883Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.9948922Z 
2025-12-04T12:12:57.9949198Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9949410Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9949519Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9949643Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9949982Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9950194Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9950306Z graph_break []
2025-12-04T12:12:57.9950515Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9953163Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9953269Z   return x.grad, w.grad
2025-12-04T12:12:57.9954004Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9954105Z   warnings.warn(
2025-12-04T12:12:57.9956750Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9956867Z   return x.grad, w.grad
2025-12-04T12:12:57.9957414Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.9957580Z Traceback (most recent call last):
2025-12-04T12:12:57.9958037Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9958274Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9958480Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9958485Z 
2025-12-04T12:12:57.9958691Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9959635Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.9959642Z 
2025-12-04T12:12:57.9959899Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9960158Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9960268Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9960379Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9960725Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9960939Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9961033Z graph_break []
2025-12-04T12:12:57.9961259Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9963998Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9964118Z   return x.grad, w.grad
2025-12-04T12:12:57.9964836Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9964949Z   warnings.warn(
2025-12-04T12:12:57.9967582Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9967702Z   return x.grad, w.grad
2025-12-04T12:12:57.9967914Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9968025Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9968150Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9968364Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9968695Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9968803Z graph_break []
2025-12-04T12:12:57.9969013Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9971704Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9971842Z   return x.grad, w.grad
2025-12-04T12:12:57.9972554Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9972666Z   warnings.warn(
2025-12-04T12:12:57.9975318Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9975440Z   return x.grad, w.grad
2025-12-04T12:12:57.9975582Z =================================== FAILURES ===================================
2025-12-04T12:12:57.9976144Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:57.9976296Z Traceback (most recent call last):
2025-12-04T12:12:57.9976751Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:57.9976961Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:57.9977169Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:57.9977175Z 
2025-12-04T12:12:57.9977399Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:57.9978331Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:57.9978338Z 
2025-12-04T12:12:57.9978595Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:57.9978819Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9978930Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9979055Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9979387Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9979598Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9979704Z graph_break []
2025-12-04T12:12:57.9979913Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9982559Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9982680Z   return x.grad, w.grad
2025-12-04T12:12:57.9983393Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9983504Z   warnings.warn(
2025-12-04T12:12:57.9986187Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9986333Z   return x.grad, w.grad
2025-12-04T12:12:57.9986552Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9986659Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9986803Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9987019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9987398Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9987497Z graph_break []
2025-12-04T12:12:57.9987711Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9990371Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9990509Z   return x.grad, w.grad
2025-12-04T12:12:57.9991243Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9991341Z   warnings.warn(
2025-12-04T12:12:57.9993982Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9994091Z   return x.grad, w.grad
2025-12-04T12:12:57.9994302Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:57.9994425Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:57.9994539Z stats [('calls_captured', 10)]
2025-12-04T12:12:57.9994773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:57.9995104Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:57.9995204Z graph_break []
2025-12-04T12:12:57.9995430Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:57.9996142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:57.9996243Z   warnings.warn(
2025-12-04T12:12:57.9998922Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:57.9999059Z   return x.grad, w.grad
2025-12-04T12:12:57.9999875Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml -
2025-12-04T12:12:58.0000046Z =========================== short test summary info ============================
2025-12-04T12:12:58.0001777Z FAILED [0.1635s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0001799Z 
2025-12-04T12:12:58.0002035Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0003247Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0003257Z 
2025-12-04T12:12:58.0003521Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0003699Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0003911Z ================== 1 failed, 160 deselected, 2 rerun in 4.94s ==================
2025-12-04T12:12:58.0004064Z Got exit code 1
2025-12-04T12:12:58.0004171Z Retrying single test...
2025-12-04T12:12:58.0004814Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml
2025-12-04T12:12:58.0004976Z ============================= test session starts ==============================
2025-12-04T12:12:58.0005337Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0005444Z cachedir: .pytest_cache
2025-12-04T12:12:58.0005958Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0006095Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0006203Z configfile: pytest.ini
2025-12-04T12:12:58.0006777Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0007020Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0008040Z stepcurrent: skipping 160 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0008169Z Running 1 items in this shard
2025-12-04T12:12:58.0008175Z 
2025-12-04T12:12:58.0009067Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.5728s] [100%]
2025-12-04T12:12:58.0009975Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1707s] [100%]
2025-12-04T12:12:58.0010785Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1641s] [100%]
2025-12-04T12:12:58.0010792Z 
2025-12-04T12:12:58.0010928Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0011486Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0011604Z Traceback (most recent call last):
2025-12-04T12:12:58.0012139Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0012336Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0012582Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0012587Z 
2025-12-04T12:12:58.0012809Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0013734Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0013742Z 
2025-12-04T12:12:58.0014014Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0014227Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0014337Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0014489Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0014825Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0015056Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0015150Z graph_break []
2025-12-04T12:12:58.0015358Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0018025Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0018161Z   return x.grad, w.grad
2025-12-04T12:12:58.0018891Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0018990Z   warnings.warn(
2025-12-04T12:12:58.0021645Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0021749Z   return x.grad, w.grad
2025-12-04T12:12:58.0022300Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0022432Z Traceback (most recent call last):
2025-12-04T12:12:58.0022888Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0023096Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0023304Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0023311Z 
2025-12-04T12:12:58.0023519Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0024462Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0024467Z 
2025-12-04T12:12:58.0024726Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0024987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0025098Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0025237Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0025583Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0025795Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0025890Z graph_break []
2025-12-04T12:12:58.0026113Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0028794Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0028916Z   return x.grad, w.grad
2025-12-04T12:12:58.0029627Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0029736Z   warnings.warn(
2025-12-04T12:12:58.0032399Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0032517Z   return x.grad, w.grad
2025-12-04T12:12:58.0032730Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0032840Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0032965Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0033182Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0033512Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0033622Z graph_break []
2025-12-04T12:12:58.0033832Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0036476Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0036583Z   return x.grad, w.grad
2025-12-04T12:12:58.0037315Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0037416Z   warnings.warn(
2025-12-04T12:12:58.0040158Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0040310Z   return x.grad, w.grad
2025-12-04T12:12:58.0040455Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0041019Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0041140Z Traceback (most recent call last):
2025-12-04T12:12:58.0041600Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0041808Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0042017Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0042022Z 
2025-12-04T12:12:58.0042344Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0043275Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0043283Z 
2025-12-04T12:12:58.0043545Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0043773Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0043929Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0044057Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0044387Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0044600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0044709Z graph_break []
2025-12-04T12:12:58.0044921Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0047596Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0047705Z   return x.grad, w.grad
2025-12-04T12:12:58.0048418Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0048530Z   warnings.warn(
2025-12-04T12:12:58.0051165Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0051287Z   return x.grad, w.grad
2025-12-04T12:12:58.0051500Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0051622Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0051732Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0051951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0052325Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0052423Z graph_break []
2025-12-04T12:12:58.0052634Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0055331Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0055435Z   return x.grad, w.grad
2025-12-04T12:12:58.0056196Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0056297Z   warnings.warn(
2025-12-04T12:12:58.0058939Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0059075Z   return x.grad, w.grad
2025-12-04T12:12:58.0059283Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0059400Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0059514Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0059745Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0060075Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0060171Z graph_break []
2025-12-04T12:12:58.0060390Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0061106Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0061206Z   warnings.warn(
2025-12-04T12:12:58.0063852Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0063959Z   return x.grad, w.grad
2025-12-04T12:12:58.0064772Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml -
2025-12-04T12:12:58.0064943Z =========================== short test summary info ============================
2025-12-04T12:12:58.0066006Z FAILED [0.1641s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0066011Z 
2025-12-04T12:12:58.0066223Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0067191Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0067223Z 
2025-12-04T12:12:58.0067485Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0067660Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0067867Z ================== 1 failed, 174 deselected, 2 rerun in 4.96s ==================
2025-12-04T12:12:58.0067966Z Got exit code 1
2025-12-04T12:12:58.0068070Z Retrying single test...
2025-12-04T12:12:58.0068711Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml
2025-12-04T12:12:58.0068875Z ============================= test session starts ==============================
2025-12-04T12:12:58.0069264Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0069372Z cachedir: .pytest_cache
2025-12-04T12:12:58.0069885Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0070021Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0070128Z configfile: pytest.ini
2025-12-04T12:12:58.0070715Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0070968Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0071980Z stepcurrent: skipping 160 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0072102Z Running 1 items in this shard
2025-12-04T12:12:58.0072109Z 
2025-12-04T12:12:58.0072998Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [4.6072s] [100%]
2025-12-04T12:12:58.0073895Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True ('RERUN', {'yellow': True}) [0.1642s] [100%]
2025-12-04T12:12:58.0082054Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True FAILED [0.1616s] [100%]
2025-12-04T12:12:58.0082089Z 
2025-12-04T12:12:58.0082433Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0083013Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0083142Z Traceback (most recent call last):
2025-12-04T12:12:58.0083634Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0083837Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0084044Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0084050Z 
2025-12-04T12:12:58.0084276Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0085214Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0085219Z 
2025-12-04T12:12:58.0085495Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0085715Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0085831Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0086087Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0086425Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0086679Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0086791Z graph_break []
2025-12-04T12:12:58.0087006Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0089720Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0089829Z   return x.grad, w.grad
2025-12-04T12:12:58.0090563Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0090707Z   warnings.warn(
2025-12-04T12:12:58.0093339Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0093501Z   return x.grad, w.grad
2025-12-04T12:12:58.0094053Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0094187Z Traceback (most recent call last):
2025-12-04T12:12:58.0094641Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0094837Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0095049Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0095057Z 
2025-12-04T12:12:58.0095269Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0096207Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0096213Z 
2025-12-04T12:12:58.0096476Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0096693Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0096817Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0096928Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0097272Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0097485Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0097581Z graph_break []
2025-12-04T12:12:58.0097802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0100479Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0100630Z   return x.grad, w.grad
2025-12-04T12:12:58.0101710Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0101816Z   warnings.warn(
2025-12-04T12:12:58.0104568Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0104676Z   return x.grad, w.grad
2025-12-04T12:12:58.0104904Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0105011Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0105129Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0105365Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0105748Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0105860Z graph_break []
2025-12-04T12:12:58.0106071Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0108713Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0108830Z   return x.grad, w.grad
2025-12-04T12:12:58.0109548Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0109659Z   warnings.warn(
2025-12-04T12:12:58.0112297Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0112415Z   return x.grad, w.grad
2025-12-04T12:12:58.0112556Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0113105Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True _
2025-12-04T12:12:58.0113235Z Traceback (most recent call last):
2025-12-04T12:12:58.0113694Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0113897Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0114105Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0114166Z 
2025-12-04T12:12:58.0114379Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0115362Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0115368Z 
2025-12-04T12:12:58.0115629Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0115862Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0115974Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0116087Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0116436Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0116655Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0116800Z graph_break []
2025-12-04T12:12:58.0117016Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0119671Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0119834Z   return x.grad, w.grad
2025-12-04T12:12:58.0120547Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0120662Z   warnings.warn(
2025-12-04T12:12:58.0123362Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0123490Z   return x.grad, w.grad
2025-12-04T12:12:58.0123701Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0123812Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0123939Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0124158Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0124492Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0124604Z graph_break []
2025-12-04T12:12:58.0124814Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0127463Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0127568Z   return x.grad, w.grad
2025-12-04T12:12:58.0128341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0128469Z   warnings.warn(
2025-12-04T12:12:58.0131103Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0131221Z   return x.grad, w.grad
2025-12-04T12:12:58.0131431Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0131586Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0131700Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0131917Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0132264Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0132360Z graph_break []
2025-12-04T12:12:58.0132585Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0133300Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0133431Z   warnings.warn(
2025-12-04T12:12:58.0136077Z /var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py:315: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:492.)
2025-12-04T12:12:58.0136183Z   return x.grad, w.grad
2025-12-04T12:12:58.0136993Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml -
2025-12-04T12:12:58.0137164Z =========================== short test summary info ============================
2025-12-04T12:12:58.0138233Z FAILED [0.1616s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0138239Z 
2025-12-04T12:12:58.0138454Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0139384Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0139404Z 
2025-12-04T12:12:58.0139663Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0139837Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0140046Z ================== 1 failed, 174 deselected, 2 rerun in 4.99s ==================
2025-12-04T12:12:58.0140144Z Got exit code 1
2025-12-04T12:12:58.0140988Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True
2025-12-04T12:12:58.0141401Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:58.0142060Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml
2025-12-04T12:12:58.0142261Z ============================= test session starts ==============================
2025-12-04T12:12:58.0142606Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0142714Z cachedir: .pytest_cache
2025-12-04T12:12:58.0143235Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0143356Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0143461Z configfile: pytest.ini
2025-12-04T12:12:58.0144050Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0144301Z collecting ... collected 380 items / 161 deselected / 219 selected
2025-12-04T12:12:58.0144463Z stepcurrent: skipping 161 already run items.
2025-12-04T12:12:58.0144576Z Running 14 items in this shard
2025-12-04T12:12:58.0144583Z 
2025-12-04T12:12:58.0145584Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  7%]
2025-12-04T12:12:58.0146485Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5574s] [ 14%]
2025-12-04T12:12:58.0147416Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1613s] [ 14%]
2025-12-04T12:12:58.0148243Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1595s] [ 14%]
2025-12-04T12:12:58.0148251Z 
2025-12-04T12:12:58.0148387Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0148951Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0149070Z Traceback (most recent call last):
2025-12-04T12:12:58.0149536Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0149748Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0149956Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0149961Z 
2025-12-04T12:12:58.0150182Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0151117Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0151124Z 
2025-12-04T12:12:58.0151384Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0151613Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0151722Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0151847Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0152178Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0152392Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0152502Z graph_break []
2025-12-04T12:12:58.0152711Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0153498Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0153612Z   warnings.warn(
2025-12-04T12:12:58.0154211Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0154343Z Traceback (most recent call last):
2025-12-04T12:12:58.0154799Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0154993Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0155212Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0155218Z 
2025-12-04T12:12:58.0155426Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0156410Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0156418Z 
2025-12-04T12:12:58.0156679Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0156895Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0157015Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0157129Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0157457Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0157716Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0157812Z graph_break []
2025-12-04T12:12:58.0158039Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0158754Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0158854Z   warnings.warn(
2025-12-04T12:12:58.0159080Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0159187Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0159302Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0159527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0159857Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0159966Z graph_break []
2025-12-04T12:12:58.0160182Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0160890Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0161001Z   warnings.warn(
2025-12-04T12:12:58.0161141Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0161697Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0161829Z Traceback (most recent call last):
2025-12-04T12:12:58.0162355Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0162570Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0162777Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0162785Z 
2025-12-04T12:12:58.0162992Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0163935Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0163942Z 
2025-12-04T12:12:58.0164200Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0164478Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0164590Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0164735Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0165084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0165300Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0165394Z graph_break []
2025-12-04T12:12:58.0165621Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0166341Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0166452Z   warnings.warn(
2025-12-04T12:12:58.0166663Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0166772Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0166935Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0167150Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0167482Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0167592Z graph_break []
2025-12-04T12:12:58.0167801Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0168524Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0168660Z   warnings.warn(
2025-12-04T12:12:58.0168868Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0168991Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0169099Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0169313Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0169657Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0169753Z graph_break []
2025-12-04T12:12:58.0169977Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0170686Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0170782Z   warnings.warn(
2025-12-04T12:12:58.0171597Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml -
2025-12-04T12:12:58.0171764Z =========================== short test summary info ============================
2025-12-04T12:12:58.0172839Z FAILED [0.1595s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0172847Z 
2025-12-04T12:12:58.0173056Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0174000Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0174017Z 
2025-12-04T12:12:58.0174282Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0174460Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0174690Z ============ 1 failed, 1 skipped, 161 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:58.0174789Z Got exit code 1
2025-12-04T12:12:58.0174894Z Retrying single test...
2025-12-04T12:12:58.0175534Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml
2025-12-04T12:12:58.0175735Z ============================= test session starts ==============================
2025-12-04T12:12:58.0176120Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0176229Z cachedir: .pytest_cache
2025-12-04T12:12:58.0176738Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0176871Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0176979Z configfile: pytest.ini
2025-12-04T12:12:58.0177551Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0177791Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0178841Z stepcurrent: skipping 162 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0178973Z Running 1 items in this shard
2025-12-04T12:12:58.0178981Z 
2025-12-04T12:12:58.0179879Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5526s] [100%]
2025-12-04T12:12:58.0180783Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1608s] [100%]
2025-12-04T12:12:58.0181630Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1584s] [100%]
2025-12-04T12:12:58.0181635Z 
2025-12-04T12:12:58.0181775Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0182346Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0182471Z Traceback (most recent call last):
2025-12-04T12:12:58.0182944Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0183136Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0183344Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0183349Z 
2025-12-04T12:12:58.0183573Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0184497Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0184502Z 
2025-12-04T12:12:58.0184774Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0184987Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0185099Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0185227Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0185559Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0185773Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0185881Z graph_break []
2025-12-04T12:12:58.0186092Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0186820Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0186918Z   warnings.warn(
2025-12-04T12:12:58.0187477Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0187652Z Traceback (most recent call last):
2025-12-04T12:12:58.0188112Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0188349Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0188555Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0188560Z 
2025-12-04T12:12:58.0188767Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0189707Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0189713Z 
2025-12-04T12:12:58.0189973Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0190227Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0190339Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0190455Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0190804Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0191019Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0191115Z graph_break []
2025-12-04T12:12:58.0191344Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0192092Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0192207Z   warnings.warn(
2025-12-04T12:12:58.0192416Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0192524Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0192649Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0192870Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0193200Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0193312Z graph_break []
2025-12-04T12:12:58.0193519Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0194239Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0194339Z   warnings.warn(
2025-12-04T12:12:58.0194480Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0195046Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0195164Z Traceback (most recent call last):
2025-12-04T12:12:58.0195628Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0195832Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0196042Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0196047Z 
2025-12-04T12:12:58.0196265Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0197192Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0197199Z 
2025-12-04T12:12:58.0197459Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0197682Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0197792Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0197916Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0198297Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0198512Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0198655Z graph_break []
2025-12-04T12:12:58.0198866Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0199583Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0199697Z   warnings.warn(
2025-12-04T12:12:58.0199907Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0200033Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0200144Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0200360Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0200731Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0201045Z graph_break []
2025-12-04T12:12:58.0201372Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0202101Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0202252Z   warnings.warn(
2025-12-04T12:12:58.0202474Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0202674Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0202785Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0203012Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0203338Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0203432Z graph_break []
2025-12-04T12:12:58.0203654Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0204370Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0204487Z   warnings.warn(
2025-12-04T12:12:58.0205285Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml -
2025-12-04T12:12:58.0205452Z =========================== short test summary info ============================
2025-12-04T12:12:58.0206532Z FAILED [0.1584s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0206538Z 
2025-12-04T12:12:58.0206750Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0207701Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0207710Z 
2025-12-04T12:12:58.0207972Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0208148Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0208358Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:58.0208459Z Got exit code 1
2025-12-04T12:12:58.0208577Z Retrying single test...
2025-12-04T12:12:58.0209202Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml
2025-12-04T12:12:58.0209362Z ============================= test session starts ==============================
2025-12-04T12:12:58.0209721Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0209877Z cachedir: .pytest_cache
2025-12-04T12:12:58.0210391Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0210569Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0210676Z configfile: pytest.ini
2025-12-04T12:12:58.0211261Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0211484Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0212499Z stepcurrent: skipping 162 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0212621Z Running 1 items in this shard
2025-12-04T12:12:58.0212626Z 
2025-12-04T12:12:58.0213558Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5434s] [100%]
2025-12-04T12:12:58.0214462Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [100%]
2025-12-04T12:12:58.0215276Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1592s] [100%]
2025-12-04T12:12:58.0215322Z 
2025-12-04T12:12:58.0215472Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0216028Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0216148Z Traceback (most recent call last):
2025-12-04T12:12:58.0216627Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0216827Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0217047Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0217052Z 
2025-12-04T12:12:58.0217259Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0218193Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0218200Z 
2025-12-04T12:12:58.0218472Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0218686Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0218812Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0218929Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0219263Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0219494Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0219591Z graph_break []
2025-12-04T12:12:58.0219802Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0220534Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0220634Z   warnings.warn(
2025-12-04T12:12:58.0221200Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0221318Z Traceback (most recent call last):
2025-12-04T12:12:58.0221777Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0222021Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0222227Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0222262Z 
2025-12-04T12:12:58.0222471Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0223411Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0223419Z 
2025-12-04T12:12:58.0223675Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0223900Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0224005Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0224144Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0224506Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0224723Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0224838Z graph_break []
2025-12-04T12:12:58.0225051Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0225761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0225874Z   warnings.warn(
2025-12-04T12:12:58.0226116Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0226243Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0226358Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0226573Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0226921Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0227019Z graph_break []
2025-12-04T12:12:58.0227233Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0227961Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0228064Z   warnings.warn(
2025-12-04T12:12:58.0228221Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0228775Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0228899Z Traceback (most recent call last):
2025-12-04T12:12:58.0229374Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0229566Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0229773Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0229794Z 
2025-12-04T12:12:58.0230005Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0230931Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0230939Z 
2025-12-04T12:12:58.0231215Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0231430Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0231553Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0231666Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0231996Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0232221Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0232315Z graph_break []
2025-12-04T12:12:58.0232561Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0233287Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0233417Z   warnings.warn(
2025-12-04T12:12:58.0233640Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0233748Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0233862Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0234092Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0234418Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0234517Z graph_break []
2025-12-04T12:12:58.0234743Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0235486Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0235597Z   warnings.warn(
2025-12-04T12:12:58.0235812Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0235918Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0236042Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0236253Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0236577Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0236799Z graph_break []
2025-12-04T12:12:58.0237007Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0237717Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0237828Z   warnings.warn(
2025-12-04T12:12:58.0238631Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml -
2025-12-04T12:12:58.0238813Z =========================== short test summary info ============================
2025-12-04T12:12:58.0239874Z FAILED [0.1592s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0239882Z 
2025-12-04T12:12:58.0240106Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0241038Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0241043Z 
2025-12-04T12:12:58.0241304Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0241497Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0241693Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:58.0241806Z Got exit code 1
2025-12-04T12:12:58.0242722Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0243126Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:58.0243765Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml
2025-12-04T12:12:58.0243926Z ============================= test session starts ==============================
2025-12-04T12:12:58.0244322Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0244431Z cachedir: .pytest_cache
2025-12-04T12:12:58.0244940Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0245102Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0245208Z configfile: pytest.ini
2025-12-04T12:12:58.0245783Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0246025Z collecting ... collected 380 items / 163 deselected / 217 selected
2025-12-04T12:12:58.0246169Z stepcurrent: skipping 163 already run items.
2025-12-04T12:12:58.0246296Z Running 12 items in this shard
2025-12-04T12:12:58.0246301Z 
2025-12-04T12:12:58.0247340Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [  8%]
2025-12-04T12:12:58.0248344Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0029s] (Skip non-critical tests to save resources.) [ 16%]
2025-12-04T12:12:58.0249349Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0037s] (Skip non-critical tests to save resources.) [ 25%]
2025-12-04T12:12:58.0250274Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5536s] [ 33%]
2025-12-04T12:12:58.0251172Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1618s] [ 33%]
2025-12-04T12:12:58.0251978Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1580s] [ 33%]
2025-12-04T12:12:58.0251986Z 
2025-12-04T12:12:58.0252135Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0252684Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0252806Z Traceback (most recent call last):
2025-12-04T12:12:58.0253282Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0253476Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0253694Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0253698Z 
2025-12-04T12:12:58.0253910Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0254838Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0254859Z 
2025-12-04T12:12:58.0255117Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0255330Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0255454Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0255566Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0255895Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0256124Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0256220Z graph_break []
2025-12-04T12:12:58.0256432Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0257197Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0257324Z   warnings.warn(
2025-12-04T12:12:58.0257890Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0258008Z Traceback (most recent call last):
2025-12-04T12:12:58.0258469Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0258672Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0258876Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0258881Z 
2025-12-04T12:12:58.0259099Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0260060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0260068Z 
2025-12-04T12:12:58.0260327Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0260550Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0260660Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0260818Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0261151Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0261363Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0261471Z graph_break []
2025-12-04T12:12:58.0261681Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0262399Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0262510Z   warnings.warn(
2025-12-04T12:12:58.0262719Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0262840Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0262950Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0263162Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0263506Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0263601Z graph_break []
2025-12-04T12:12:58.0263809Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0264532Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0264630Z   warnings.warn(
2025-12-04T12:12:58.0264784Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0265335Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0265455Z Traceback (most recent call last):
2025-12-04T12:12:58.0265928Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0266121Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0266328Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0266345Z 
2025-12-04T12:12:58.0266554Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0267483Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0267488Z 
2025-12-04T12:12:58.0267796Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0268036Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0268142Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0268264Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0268595Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0268820Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0268914Z graph_break []
2025-12-04T12:12:58.0269121Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0269846Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0269943Z   warnings.warn(
2025-12-04T12:12:58.0270182Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0270301Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0270414Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0270639Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0270969Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0271062Z graph_break []
2025-12-04T12:12:58.0271284Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0272025Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0272122Z   warnings.warn(
2025-12-04T12:12:58.0272342Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0272448Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0272574Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0272791Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0273121Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0273227Z graph_break []
2025-12-04T12:12:58.0273434Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0274142Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0274254Z   warnings.warn(
2025-12-04T12:12:58.0275054Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml -
2025-12-04T12:12:58.0275233Z =========================== short test summary info ============================
2025-12-04T12:12:58.0276300Z FAILED [0.1580s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0276308Z 
2025-12-04T12:12:58.0276533Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0277464Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0277471Z 
2025-12-04T12:12:58.0277731Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0277921Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0278134Z ============ 1 failed, 3 skipped, 163 deselected, 2 rerun in 4.94s =============
2025-12-04T12:12:58.0278229Z Got exit code 1
2025-12-04T12:12:58.0278348Z Retrying single test...
2025-12-04T12:12:58.0279006Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml
2025-12-04T12:12:58.0279205Z ============================= test session starts ==============================
2025-12-04T12:12:58.0279544Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0279648Z cachedir: .pytest_cache
2025-12-04T12:12:58.0280166Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0280289Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0280405Z configfile: pytest.ini
2025-12-04T12:12:58.0280978Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0281244Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0282343Z stepcurrent: skipping 166 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0282458Z Running 1 items in this shard
2025-12-04T12:12:58.0282463Z 
2025-12-04T12:12:58.0283371Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5473s] [100%]
2025-12-04T12:12:58.0284298Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1606s] [100%]
2025-12-04T12:12:58.0285107Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1565s] [100%]
2025-12-04T12:12:58.0285129Z 
2025-12-04T12:12:58.0285267Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0285823Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0285959Z Traceback (most recent call last):
2025-12-04T12:12:58.0286419Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0286618Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0286841Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0286846Z 
2025-12-04T12:12:58.0287057Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0288006Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0288011Z 
2025-12-04T12:12:58.0288273Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0288488Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0288614Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0288729Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0289075Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0289294Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0289391Z graph_break []
2025-12-04T12:12:58.0289619Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0290345Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0290446Z   warnings.warn(
2025-12-04T12:12:58.0291044Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0291194Z Traceback (most recent call last):
2025-12-04T12:12:58.0291674Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0291871Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0292079Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0292084Z 
2025-12-04T12:12:58.0292309Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0293237Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0293241Z 
2025-12-04T12:12:58.0293548Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0293758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0293870Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0294000Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0294331Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0294546Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0294692Z graph_break []
2025-12-04T12:12:58.0294900Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0295628Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0295726Z   warnings.warn(
2025-12-04T12:12:58.0295940Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0296067Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0296179Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0296393Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0296730Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0296826Z graph_break []
2025-12-04T12:12:58.0297045Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0297761Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0297858Z   warnings.warn(
2025-12-04T12:12:58.0298010Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0298563Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0298696Z Traceback (most recent call last):
2025-12-04T12:12:58.0299155Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0299347Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0299567Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0299572Z 
2025-12-04T12:12:58.0299781Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0300710Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0300728Z 
2025-12-04T12:12:58.0301280Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0301490Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0301617Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0301817Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0302148Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0302418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0302511Z graph_break []
2025-12-04T12:12:58.0302733Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0303450Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0303548Z   warnings.warn(
2025-12-04T12:12:58.0303769Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0303876Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0303987Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0304256Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0304596Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0304706Z graph_break []
2025-12-04T12:12:58.0304916Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0305624Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0305778Z   warnings.warn(
2025-12-04T12:12:58.0305986Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0306096Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0306219Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0306434Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0306776Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0306871Z graph_break []
2025-12-04T12:12:58.0307083Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0307800Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0307895Z   warnings.warn(
2025-12-04T12:12:58.0308695Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml -
2025-12-04T12:12:58.0308875Z =========================== short test summary info ============================
2025-12-04T12:12:58.0309931Z FAILED [0.1565s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0309937Z 
2025-12-04T12:12:58.0310162Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0311092Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0311099Z 
2025-12-04T12:12:58.0311373Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0311546Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0311741Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:58.0311849Z Got exit code 1
2025-12-04T12:12:58.0311952Z Retrying single test...
2025-12-04T12:12:58.0312577Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml
2025-12-04T12:12:58.0312754Z ============================= test session starts ==============================
2025-12-04T12:12:58.0313126Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0313279Z cachedir: .pytest_cache
2025-12-04T12:12:58.0313789Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0313910Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0314029Z configfile: pytest.ini
2025-12-04T12:12:58.0314606Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0314827Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0315891Z stepcurrent: skipping 166 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0316005Z Running 1 items in this shard
2025-12-04T12:12:58.0316010Z 
2025-12-04T12:12:58.0316903Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [4.5529s] [100%]
2025-12-04T12:12:58.0317797Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False ('RERUN', {'yellow': True}) [0.1621s] [100%]
2025-12-04T12:12:58.0318650Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False FAILED [0.1610s] [100%]
2025-12-04T12:12:58.0318656Z 
2025-12-04T12:12:58.0318796Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0319347Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0319479Z Traceback (most recent call last):
2025-12-04T12:12:58.0319944Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0320151Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0320356Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0320361Z 
2025-12-04T12:12:58.0320570Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0321515Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0321520Z 
2025-12-04T12:12:58.0321779Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0322007Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0322179Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0322299Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0322647Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0322862Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0322971Z graph_break []
2025-12-04T12:12:58.0323182Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0323902Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0324013Z   warnings.warn(
2025-12-04T12:12:58.0324563Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0324683Z Traceback (most recent call last):
2025-12-04T12:12:58.0325200Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0325440Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0325660Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0325665Z 
2025-12-04T12:12:58.0325873Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0326800Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0326807Z 
2025-12-04T12:12:58.0327081Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0327294Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0327454Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0327567Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0327897Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0328126Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0328221Z graph_break []
2025-12-04T12:12:58.0328431Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0329155Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0329283Z   warnings.warn(
2025-12-04T12:12:58.0329504Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0329611Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0329721Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0329946Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0330273Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0330367Z graph_break []
2025-12-04T12:12:58.0330589Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0331296Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0331408Z   warnings.warn(
2025-12-04T12:12:58.0331548Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0332098Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False _
2025-12-04T12:12:58.0332228Z Traceback (most recent call last):
2025-12-04T12:12:58.0332689Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0332882Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0333099Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0333106Z 
2025-12-04T12:12:58.0333313Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0334249Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0334257Z 
2025-12-04T12:12:58.0334516Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0334740Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0334854Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0334966Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0335307Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0335580Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0335678Z graph_break []
2025-12-04T12:12:58.0335932Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0336648Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0336760Z   warnings.warn(
2025-12-04T12:12:58.0336967Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0337078Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0337206Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0337418Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0337746Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0337851Z graph_break []
2025-12-04T12:12:58.0338091Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0338801Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0338913Z   warnings.warn(
2025-12-04T12:12:58.0339120Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0339241Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0339353Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0339600Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0339939Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0340034Z graph_break []
2025-12-04T12:12:58.0340240Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0340967Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0341063Z   warnings.warn(
2025-12-04T12:12:58.0341889Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml -
2025-12-04T12:12:58.0342054Z =========================== short test summary info ============================
2025-12-04T12:12:58.0343113Z FAILED [0.1610s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0343135Z 
2025-12-04T12:12:58.0343346Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0344279Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0344285Z 
2025-12-04T12:12:58.0344557Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0344737Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0344944Z ================== 1 failed, 174 deselected, 2 rerun in 4.93s ==================
2025-12-04T12:12:58.0345043Z Got exit code 1
2025-12-04T12:12:58.0345885Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False
2025-12-04T12:12:58.0346308Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:58.0346933Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml
2025-12-04T12:12:58.0347141Z ============================= test session starts ==============================
2025-12-04T12:12:58.0347485Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0347622Z cachedir: .pytest_cache
2025-12-04T12:12:58.0348147Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0348269Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0348377Z configfile: pytest.ini
2025-12-04T12:12:58.0348969Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0349196Z collecting ... collected 380 items / 167 deselected / 213 selected
2025-12-04T12:12:58.0349356Z stepcurrent: skipping 167 already run items.
2025-12-04T12:12:58.0349471Z Running 8 items in this shard
2025-12-04T12:12:58.0349476Z 
2025-12-04T12:12:58.0350520Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 12%]
2025-12-04T12:12:58.0351434Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5521s] [ 25%]
2025-12-04T12:12:58.0352333Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1598s] [ 25%]
2025-12-04T12:12:58.0353190Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1583s] [ 25%]
2025-12-04T12:12:58.0353196Z 
2025-12-04T12:12:58.0353339Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0353904Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0354025Z Traceback (most recent call last):
2025-12-04T12:12:58.0354488Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0354697Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0354904Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0354909Z 
2025-12-04T12:12:58.0355117Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0356063Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0356068Z 
2025-12-04T12:12:58.0356330Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0356557Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0356673Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0356784Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0357132Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0357345Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0357458Z graph_break []
2025-12-04T12:12:58.0357666Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0358379Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0358489Z   warnings.warn(
2025-12-04T12:12:58.0359076Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0359194Z Traceback (most recent call last):
2025-12-04T12:12:58.0359690Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0359881Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0360098Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0360103Z 
2025-12-04T12:12:58.0360311Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0361239Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0361255Z 
2025-12-04T12:12:58.0361512Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0361759Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0361880Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0361993Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0362396Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0362624Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0362718Z graph_break []
2025-12-04T12:12:58.0362929Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0363698Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0363797Z   warnings.warn(
2025-12-04T12:12:58.0364018Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0364127Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0364239Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0364468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0364794Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0364890Z graph_break []
2025-12-04T12:12:58.0365113Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0365829Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0365943Z   warnings.warn(
2025-12-04T12:12:58.0366083Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0366635Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0366772Z Traceback (most recent call last):
2025-12-04T12:12:58.0367238Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0367445Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0367653Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0367658Z 
2025-12-04T12:12:58.0367866Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0368807Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0368815Z 
2025-12-04T12:12:58.0369078Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0369302Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0369410Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0369522Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0369908Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0370123Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0370249Z graph_break []
2025-12-04T12:12:58.0370470Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0371182Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0371295Z   warnings.warn(
2025-12-04T12:12:58.0371502Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0371609Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0371734Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0371951Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0372310Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0372425Z graph_break []
2025-12-04T12:12:58.0372635Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0373358Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0373457Z   warnings.warn(
2025-12-04T12:12:58.0373664Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0373817Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0373927Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0374140Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0374477Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0374574Z graph_break []
2025-12-04T12:12:58.0374783Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0375503Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0375602Z   warnings.warn(
2025-12-04T12:12:58.0376410Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml -
2025-12-04T12:12:58.0376575Z =========================== short test summary info ============================
2025-12-04T12:12:58.0377633Z FAILED [0.1583s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0377651Z 
2025-12-04T12:12:58.0377860Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0378791Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0378802Z 
2025-12-04T12:12:58.0379073Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0379246Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0379474Z ============ 1 failed, 1 skipped, 167 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:58.0379573Z Got exit code 1
2025-12-04T12:12:58.0379676Z Retrying single test...
2025-12-04T12:12:58.0380314Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml
2025-12-04T12:12:58.0380471Z ============================= test session starts ==============================
2025-12-04T12:12:58.0380816Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0380966Z cachedir: .pytest_cache
2025-12-04T12:12:58.0381474Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0381638Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0381741Z configfile: pytest.ini
2025-12-04T12:12:58.0382311Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0382547Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0383562Z stepcurrent: skipping 168 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0383687Z Running 1 items in this shard
2025-12-04T12:12:58.0383691Z 
2025-12-04T12:12:58.0384612Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5510s] [100%]
2025-12-04T12:12:58.0385494Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%]
2025-12-04T12:12:58.0386317Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1577s] [100%]
2025-12-04T12:12:58.0386352Z 
2025-12-04T12:12:58.0386491Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0387055Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0387177Z Traceback (most recent call last):
2025-12-04T12:12:58.0387640Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0387847Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0388051Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0388056Z 
2025-12-04T12:12:58.0388279Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0389212Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0389220Z 
2025-12-04T12:12:58.0389490Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0389703Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0389815Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0389940Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0390268Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0390486Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0390592Z graph_break []
2025-12-04T12:12:58.0390801Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0391529Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0391630Z   warnings.warn(
2025-12-04T12:12:58.0392181Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0392312Z Traceback (most recent call last):
2025-12-04T12:12:58.0392771Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0392995Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0393242Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0393247Z 
2025-12-04T12:12:58.0393456Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0394395Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0394402Z 
2025-12-04T12:12:58.0394661Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0394874Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0394998Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0395111Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0395490Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0395706Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0395804Z graph_break []
2025-12-04T12:12:58.0396026Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0396743Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0396880Z   warnings.warn(
2025-12-04T12:12:58.0397107Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0397218Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0397344Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0397560Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0397884Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0397994Z graph_break []
2025-12-04T12:12:58.0398205Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0398920Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0399032Z   warnings.warn(
2025-12-04T12:12:58.0399173Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0399739Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0399859Z Traceback (most recent call last):
2025-12-04T12:12:58.0400321Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0400529Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0400737Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0400742Z 
2025-12-04T12:12:58.0401237Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0402229Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0402236Z 
2025-12-04T12:12:58.0402503Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0402736Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0402847Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0402962Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0403313Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0403527Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0403640Z graph_break []
2025-12-04T12:12:58.0403932Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0404654Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0404814Z   warnings.warn(
2025-12-04T12:12:58.0405025Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0405133Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0405263Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0405477Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0405818Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0405914Z graph_break []
2025-12-04T12:12:58.0406125Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0406896Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0406997Z   warnings.warn(
2025-12-04T12:12:58.0407204Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0407329Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0407440Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0407669Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0408054Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0408150Z graph_break []
2025-12-04T12:12:58.0408374Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0409082Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0409183Z   warnings.warn(
2025-12-04T12:12:58.0410001Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml -
2025-12-04T12:12:58.0410171Z =========================== short test summary info ============================
2025-12-04T12:12:58.0411252Z FAILED [0.1577s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0411260Z 
2025-12-04T12:12:58.0411472Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0412414Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0412419Z 
2025-12-04T12:12:58.0412685Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0412863Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0413078Z ================== 1 failed, 174 deselected, 2 rerun in 4.92s ==================
2025-12-04T12:12:58.0413176Z Got exit code 1
2025-12-04T12:12:58.0413283Z Retrying single test...
2025-12-04T12:12:58.0413929Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml
2025-12-04T12:12:58.0414089Z ============================= test session starts ==============================
2025-12-04T12:12:58.0414442Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0414550Z cachedir: .pytest_cache
2025-12-04T12:12:58.0415055Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0415228Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0415338Z configfile: pytest.ini
2025-12-04T12:12:58.0415952Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0416171Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0417182Z stepcurrent: skipping 168 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0417307Z Running 1 items in this shard
2025-12-04T12:12:58.0417312Z 
2025-12-04T12:12:58.0418211Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5367s] [100%]
2025-12-04T12:12:58.0419228Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1599s] [100%]
2025-12-04T12:12:58.0420044Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1561s] [100%]
2025-12-04T12:12:58.0420049Z 
2025-12-04T12:12:58.0420201Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0420778Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0420895Z Traceback (most recent call last):
2025-12-04T12:12:58.0421369Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0421563Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0421771Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0421776Z 
2025-12-04T12:12:58.0421998Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0422931Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0422936Z 
2025-12-04T12:12:58.0423209Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0423422Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0423534Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0423658Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0423989Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0424215Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0424313Z graph_break []
2025-12-04T12:12:58.0424520Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0425253Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0425353Z   warnings.warn(
2025-12-04T12:12:58.0425903Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0426034Z Traceback (most recent call last):
2025-12-04T12:12:58.0426495Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0426700Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0426906Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0426913Z 
2025-12-04T12:12:58.0427149Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0428094Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0428129Z 
2025-12-04T12:12:58.0428390Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0428615Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0428725Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0428837Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0429179Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0429391Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0429485Z graph_break []
2025-12-04T12:12:58.0429736Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0430452Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0430565Z   warnings.warn(
2025-12-04T12:12:58.0430772Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0430880Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0431002Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0431247Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0431576Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0431683Z graph_break []
2025-12-04T12:12:58.0431892Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0432619Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0432718Z   warnings.warn(
2025-12-04T12:12:58.0432860Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0433427Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0433547Z Traceback (most recent call last):
2025-12-04T12:12:58.0434016Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0434208Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0434417Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0434422Z 
2025-12-04T12:12:58.0434645Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0435582Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0435589Z 
2025-12-04T12:12:58.0435861Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0436070Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0436180Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0436303Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0436633Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0436847Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0436956Z graph_break []
2025-12-04T12:12:58.0437163Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0437924Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0438021Z   warnings.warn(
2025-12-04T12:12:58.0438228Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0438378Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0438487Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0438701Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0439038Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0439136Z graph_break []
2025-12-04T12:12:58.0439359Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0440069Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0440165Z   warnings.warn(
2025-12-04T12:12:58.0440415Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0440524Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0440633Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0440863Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0441191Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0441299Z graph_break []
2025-12-04T12:12:58.0441506Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0442320Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0442434Z   warnings.warn(
2025-12-04T12:12:58.0443238Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml -
2025-12-04T12:12:58.0443407Z =========================== short test summary info ============================
2025-12-04T12:12:58.0444478Z FAILED [0.1561s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0444486Z 
2025-12-04T12:12:58.0444697Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0445637Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0445644Z 
2025-12-04T12:12:58.0445902Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0446091Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0446291Z ================== 1 failed, 174 deselected, 2 rerun in 4.90s ==================
2025-12-04T12:12:58.0446388Z Got exit code 1
2025-12-04T12:12:58.0447245Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0447647Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:58.0448285Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml
2025-12-04T12:12:58.0448447Z ============================= test session starts ==============================
2025-12-04T12:12:58.0448786Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0448905Z cachedir: .pytest_cache
2025-12-04T12:12:58.0449448Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0449570Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0449716Z configfile: pytest.ini
2025-12-04T12:12:58.0450290Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0450523Z collecting ... collected 380 items / 169 deselected / 211 selected
2025-12-04T12:12:58.0450666Z stepcurrent: skipping 169 already run items.
2025-12-04T12:12:58.0450780Z Running 6 items in this shard
2025-12-04T12:12:58.0450785Z 
2025-12-04T12:12:58.0451808Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 16%]
2025-12-04T12:12:58.0452748Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5557s] [ 33%]
2025-12-04T12:12:58.0453648Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1585s] [ 33%]
2025-12-04T12:12:58.0454468Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1551s] [ 33%]
2025-12-04T12:12:58.0454503Z 
2025-12-04T12:12:58.0454655Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0455204Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0455324Z Traceback (most recent call last):
2025-12-04T12:12:58.0455804Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0456000Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0456209Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0456227Z 
2025-12-04T12:12:58.0456436Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0457367Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0457374Z 
2025-12-04T12:12:58.0457647Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0457861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0457985Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0458096Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0458431Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0458657Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0458754Z graph_break []
2025-12-04T12:12:58.0458964Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0459692Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0459792Z   warnings.warn(
2025-12-04T12:12:58.0460356Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0460474Z Traceback (most recent call last):
2025-12-04T12:12:58.0460933Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0461138Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0461371Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0461377Z 
2025-12-04T12:12:58.0461612Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0462554Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0462559Z 
2025-12-04T12:12:58.0462819Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0463042Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0463160Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0463272Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0463616Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0463860Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0463976Z graph_break []
2025-12-04T12:12:58.0464186Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0464905Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0465019Z   warnings.warn(
2025-12-04T12:12:58.0465226Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0465364Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0465490Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0465705Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0466048Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0466146Z graph_break []
2025-12-04T12:12:58.0466356Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0467084Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0467186Z   warnings.warn(
2025-12-04T12:12:58.0467327Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0467901Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0468022Z Traceback (most recent call last):
2025-12-04T12:12:58.0468498Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0468690Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0468895Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0468901Z 
2025-12-04T12:12:58.0469128Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0470060Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0470067Z 
2025-12-04T12:12:58.0470345Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0470555Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0470668Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0470797Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0471128Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0471339Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0471452Z graph_break []
2025-12-04T12:12:58.0471660Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0472421Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0472551Z   warnings.warn(
2025-12-04T12:12:58.0472758Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0472878Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0472987Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0473199Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0473544Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0473639Z graph_break []
2025-12-04T12:12:58.0473858Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0474598Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0474698Z   warnings.warn(
2025-12-04T12:12:58.0474920Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0475029Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0475138Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0475359Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0475683Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0475843Z graph_break []
2025-12-04T12:12:58.0476051Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0476760Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0476868Z   warnings.warn(
2025-12-04T12:12:58.0477672Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml -
2025-12-04T12:12:58.0477854Z =========================== short test summary info ============================
2025-12-04T12:12:58.0478911Z FAILED [0.1551s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0478918Z 
2025-12-04T12:12:58.0479127Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0480070Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0480076Z 
2025-12-04T12:12:58.0480332Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0480522Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0480740Z ============ 1 failed, 1 skipped, 169 deselected, 2 rerun in 4.93s =============
2025-12-04T12:12:58.0480837Z Got exit code 1
2025-12-04T12:12:58.0480955Z Retrying single test...
2025-12-04T12:12:58.0481582Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml
2025-12-04T12:12:58.0481754Z ============================= test session starts ==============================
2025-12-04T12:12:58.0482095Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0482267Z cachedir: .pytest_cache
2025-12-04T12:12:58.0482791Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0482912Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0483021Z configfile: pytest.ini
2025-12-04T12:12:58.0483649Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0483903Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0484934Z stepcurrent: skipping 170 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0485049Z Running 1 items in this shard
2025-12-04T12:12:58.0485054Z 
2025-12-04T12:12:58.0485947Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5407s] [100%]
2025-12-04T12:12:58.0486881Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1573s] [100%]
2025-12-04T12:12:58.0487693Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1559s] [100%]
2025-12-04T12:12:58.0487701Z 
2025-12-04T12:12:58.0487852Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0488405Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0488568Z Traceback (most recent call last):
2025-12-04T12:12:58.0489030Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0489220Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0489442Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0489449Z 
2025-12-04T12:12:58.0489661Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0490601Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0490609Z 
2025-12-04T12:12:58.0490868Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0491084Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0491205Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0491317Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0491649Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0491872Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0491968Z graph_break []
2025-12-04T12:12:58.0492194Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0492911Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0493012Z   warnings.warn(
2025-12-04T12:12:58.0493576Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0493695Z Traceback (most recent call last):
2025-12-04T12:12:58.0494158Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0494365Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0494571Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0494576Z 
2025-12-04T12:12:58.0494797Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0495756Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0495804Z 
2025-12-04T12:12:58.0496076Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0496288Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0496397Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0496525Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0496856Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0497066Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0497176Z graph_break []
2025-12-04T12:12:58.0497387Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0498147Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0498250Z   warnings.warn(
2025-12-04T12:12:58.0498462Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0498583Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0498696Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0498908Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0499277Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0499372Z graph_break []
2025-12-04T12:12:58.0499585Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0500306Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0500409Z   warnings.warn(
2025-12-04T12:12:58.0500562Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0501379Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0501504Z Traceback (most recent call last):
2025-12-04T12:12:58.0501977Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0502176Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0502398Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0502403Z 
2025-12-04T12:12:58.0502612Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0503545Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0503550Z 
2025-12-04T12:12:58.0503824Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0504038Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0504162Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0504275Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0504604Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0504829Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0504925Z graph_break []
2025-12-04T12:12:58.0505133Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0505857Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0505955Z   warnings.warn(
2025-12-04T12:12:58.0506260Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0506367Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0506522Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0506752Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0507084Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0507177Z graph_break []
2025-12-04T12:12:58.0507419Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0508129Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0508240Z   warnings.warn(
2025-12-04T12:12:58.0508448Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0508557Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0508725Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0508942Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0509270Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0509380Z graph_break []
2025-12-04T12:12:58.0509588Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0510307Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0510450Z   warnings.warn(
2025-12-04T12:12:58.0511248Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml -
2025-12-04T12:12:58.0511429Z =========================== short test summary info ============================
2025-12-04T12:12:58.0512493Z FAILED [0.1559s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0512501Z 
2025-12-04T12:12:58.0512731Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0513665Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0513672Z 
2025-12-04T12:12:58.0513931Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0514118Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0514315Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:58.0514431Z Got exit code 1
2025-12-04T12:12:58.0514537Z Retrying single test...
2025-12-04T12:12:58.0515165Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml
2025-12-04T12:12:58.0515338Z ============================= test session starts ==============================
2025-12-04T12:12:58.0515676Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0515782Z cachedir: .pytest_cache
2025-12-04T12:12:58.0516302Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0516422Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0516540Z configfile: pytest.ini
2025-12-04T12:12:58.0517109Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0517334Z collecting ... collected 380 items / 174 deselected / 206 selected
2025-12-04T12:12:58.0518409Z stepcurrent: skipping 170 already run items. Running only test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0518549Z Running 1 items in this shard
2025-12-04T12:12:58.0518555Z 
2025-12-04T12:12:58.0519459Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [4.5398s] [100%]
2025-12-04T12:12:58.0520350Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False ('RERUN', {'yellow': True}) [0.1609s] [100%]
2025-12-04T12:12:58.0521204Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False FAILED [0.1575s] [100%]
2025-12-04T12:12:58.0521211Z 
2025-12-04T12:12:58.0521350Z ==================================== RERUNS ====================================
2025-12-04T12:12:58.0521898Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0522029Z Traceback (most recent call last):
2025-12-04T12:12:58.0522551Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0522797Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0523006Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0523011Z 
2025-12-04T12:12:58.0523222Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0524179Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0524185Z 
2025-12-04T12:12:58.0524447Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0524677Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0524789Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0524903Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0525251Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0525468Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0525565Z graph_break []
2025-12-04T12:12:58.0525793Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0526513Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0526629Z   warnings.warn(
2025-12-04T12:12:58.0527182Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0527303Z Traceback (most recent call last):
2025-12-04T12:12:58.0527776Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0527970Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0528177Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0528182Z 
2025-12-04T12:12:58.0528407Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0529332Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0529337Z 
2025-12-04T12:12:58.0529646Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0529861Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0530000Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0530130Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0530459Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0530687Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0530786Z graph_break []
2025-12-04T12:12:58.0530998Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0531727Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0531828Z   warnings.warn(
2025-12-04T12:12:58.0532069Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0532196Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0532310Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0532543Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0532871Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0532967Z graph_break []
2025-12-04T12:12:58.0533189Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0533929Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0534025Z   warnings.warn(
2025-12-04T12:12:58.0534179Z =================================== FAILURES ===================================
2025-12-04T12:12:58.0534734Z _ NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False _
2025-12-04T12:12:58.0534869Z Traceback (most recent call last):
2025-12-04T12:12:58.0535326Z   File "/var/lib/jenkins/workspace/test/inductor/test_mix_order_reduction.py", line 346, in test_rms_norm_bwd
2025-12-04T12:12:58.0535521Z     act, (_, bwd_wrapper) = utils.run_and_get_code(fwd_bwd, opt_f)
2025-12-04T12:12:58.0535740Z ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0535745Z 
2025-12-04T12:12:58.0535953Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0536894Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0536899Z 
2025-12-04T12:12:58.0537156Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0537366Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0537488Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0537600Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0537929Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0538151Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0538245Z graph_break []
2025-12-04T12:12:58.0538466Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0539178Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0539277Z   warnings.warn(
2025-12-04T12:12:58.0539498Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0539604Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0539717Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0539988Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0540312Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0540451Z graph_break []
2025-12-04T12:12:58.0540664Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0541373Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0541489Z   warnings.warn(
2025-12-04T12:12:58.0541697Z ----------------------------- Captured stdout call -----------------------------
2025-12-04T12:12:58.0541805Z frames [('total', 1), ('ok', 1)]
2025-12-04T12:12:58.0541929Z stats [('calls_captured', 10)]
2025-12-04T12:12:58.0542146Z aot_autograd [('total', 1), ('autograd_cache_miss', 1), ('not_ok', 1)]
2025-12-04T12:12:58.0542513Z inductor [('pattern_matcher_count', 1), ('pattern_matcher_nodes', 1), ('fxgraph_cache_miss', 1)]
2025-12-04T12:12:58.0542611Z graph_break []
2025-12-04T12:12:58.0542818Z ----------------------------- Captured stderr call -----------------------------
2025-12-04T12:12:58.0543543Z /opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:2891: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping
2025-12-04T12:12:58.0543642Z   warnings.warn(
2025-12-04T12:12:58.0544441Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml -
2025-12-04T12:12:58.0544650Z =========================== short test summary info ============================
2025-12-04T12:12:58.0545706Z FAILED [0.1575s] inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False - ValueError: not enough values to unpack (expected 2, got 0)
2025-12-04T12:12:58.0545714Z 
2025-12-04T12:12:58.0545942Z To execute this test, run the following from the base repo dir:
2025-12-04T12:12:58.0546868Z     PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=1 python test/inductor/test_mix_order_reduction.py NoMixOrderReductionTest.test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0546876Z 
2025-12-04T12:12:58.0547146Z This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
2025-12-04T12:12:58.0547318Z !!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
2025-12-04T12:12:58.0547514Z ================== 1 failed, 174 deselected, 2 rerun in 4.91s ==================
2025-12-04T12:12:58.0547624Z Got exit code 1
2025-12-04T12:12:58.0548468Z FAILED CONSISTENTLY: test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False
2025-12-04T12:12:58.0548884Z Test failed consistently, continuing with the rest of the tests due to continue-through-error being set
2025-12-04T12:12:58.0549507Z Test results will be stored in test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml
2025-12-04T12:12:58.0549670Z ============================= test session starts ==============================
2025-12-04T12:12:58.0550025Z platform linux -- Python 3.10.14, pytest-7.3.2, pluggy-1.6.0 -- /opt/conda/envs/py_3.10/bin/python
2025-12-04T12:12:58.0550132Z cachedir: .pytest_cache
2025-12-04T12:12:58.0550650Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2025-12-04T12:12:58.0550772Z rootdir: /var/lib/jenkins/workspace
2025-12-04T12:12:58.0550879Z configfile: pytest.ini
2025-12-04T12:12:58.0551463Z plugins: hypothesis-6.56.4, cpp-2.3.0, flakefinder-1.1.0, rerunfailures-14.0, subtests-0.13.1, xdist-3.3.1, xdoctest-1.3.0, anyio-4.12.0, typeguard-4.3.0
2025-12-04T12:12:58.0551717Z collecting ... collected 380 items / 171 deselected / 209 selected
2025-12-04T12:12:58.0551864Z stepcurrent: skipping 171 already run items.
2025-12-04T12:12:58.0552018Z Running 4 items in this shard
2025-12-04T12:12:58.0552023Z 
2025-12-04T12:12:58.0553034Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_True SKIPPED [0.0040s] (Skip non-critical tests to save resources.) [ 25%]
2025-12-04T12:12:58.0554049Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_False SKIPPED [0.0030s] (Skip non-critical tests to save resources.) [ 50%]
2025-12-04T12:12:58.0555071Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_True_initial_xblock_1_add_1dim_True SKIPPED [0.0035s] (Skip non-critical tests to save resources.) [ 75%]
2025-12-04T12:12:58.0555803Z inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_with_dynamic_shape_dynamic_dims1 SKIPPED [0.0027s] (Mix order reduction not enabled) [100%]
2025-12-04T12:12:58.0555811Z 
2025-12-04T12:12:58.0556614Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml -
2025-12-04T12:12:58.0556797Z ====================== 4 skipped, 171 deselected in 0.06s ======================
2025-12-04T12:12:58.0591720Z The following tests failed consistently: ['test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape2_max_autotune_True_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::MixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_layer_norm_bwd_with_bias_bfloat16_split_reductions_False_shape1', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape0_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_bfloat16_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_False_shape3_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_1_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape0_max_autotune_False_initial_xblock_2_add_1dim_True', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape1_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_1_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape2_max_autotune_False_initial_xblock_2_add_1dim_False', 'test/inductor/test_mix_order_reduction.py::NoMixOrderReductionTest::test_rms_norm_bwd_float32_split_reductions_True_shape3_max_autotune_False_initial_xblock_2_add_1dim_False']
2025-12-04T12:12:58.0591930Z 
2025-12-04T12:12:58.0592535Z FINISHED PRINTING LOG FILE of inductor/test_mix_order_reduction 1/2 (test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log)
2025-12-04T12:12:58.0592542Z 
2025-12-04T12:12:58.0592936Z Finished inductor/test_mix_order_reduction 1/2 ... [2025-12-04 12:12:57.443674][10735.053576991], took 40.64min
2025-12-04T12:12:58.0593788Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml
2025-12-04T12:12:58.0594749Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml
2025-12-04T12:12:58.0595599Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml
2025-12-04T12:12:58.0596443Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml
2025-12-04T12:12:58.0597301Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml
2025-12-04T12:12:58.0598140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml
2025-12-04T12:12:58.0599009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml
2025-12-04T12:12:58.0599865Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml
2025-12-04T12:12:58.0600721Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml
2025-12-04T12:12:58.0602255Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml
2025-12-04T12:12:58.0603163Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml
2025-12-04T12:12:58.0604029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml
2025-12-04T12:12:58.0604958Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml
2025-12-04T12:12:58.0605818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml
2025-12-04T12:12:58.0606712Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml
2025-12-04T12:12:58.0949972Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml
2025-12-04T12:12:58.1298787Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml
2025-12-04T12:12:58.1696064Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml
2025-12-04T12:12:58.2076438Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml
2025-12-04T12:12:58.2402341Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml
2025-12-04T12:12:58.2868060Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml
2025-12-04T12:12:58.3243787Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml
2025-12-04T12:12:58.3688093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml
2025-12-04T12:12:58.4148322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml
2025-12-04T12:12:58.4482346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml
2025-12-04T12:12:58.4916007Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml
2025-12-04T12:12:58.5272134Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml
2025-12-04T12:12:58.5667915Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml
2025-12-04T12:12:58.6005719Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml
2025-12-04T12:12:58.6329597Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml
2025-12-04T12:12:58.6779253Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml
2025-12-04T12:12:58.7055320Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml
2025-12-04T12:12:58.7367394Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml
2025-12-04T12:12:58.7681055Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml
2025-12-04T12:12:58.8228022Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml
2025-12-04T12:12:58.8551913Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml
2025-12-04T12:12:58.9241224Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml
2025-12-04T12:12:58.9595042Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml
2025-12-04T12:12:58.9896871Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml
2025-12-04T12:12:59.0242557Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml
2025-12-04T12:12:59.0530097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml
2025-12-04T12:12:59.0863314Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml
2025-12-04T12:12:59.1147616Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml
2025-12-04T12:12:59.1433957Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml
2025-12-04T12:12:59.1763256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml
2025-12-04T12:12:59.2172212Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml
2025-12-04T12:12:59.2493974Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml
2025-12-04T12:12:59.2772228Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml
2025-12-04T12:12:59.3084960Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml
2025-12-04T12:12:59.3639783Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml
2025-12-04T12:12:59.4016780Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml
2025-12-04T12:12:59.4410269Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml
2025-12-04T12:12:59.4726115Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml
2025-12-04T12:12:59.5029829Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml
2025-12-04T12:12:59.5531942Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml
2025-12-04T12:12:59.5915467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml
2025-12-04T12:12:59.6262205Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml
2025-12-04T12:12:59.6777497Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml
2025-12-04T12:12:59.7075322Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml
2025-12-04T12:12:59.7357067Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml
2025-12-04T12:12:59.7667300Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml
2025-12-04T12:12:59.7948826Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml
2025-12-04T12:12:59.8302501Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml
2025-12-04T12:12:59.8630311Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml
2025-12-04T12:12:59.8950637Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml
2025-12-04T12:12:59.9266945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml
2025-12-04T12:12:59.9627672Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml
2025-12-04T12:12:59.9996664Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml
2025-12-04T12:13:00.0595111Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml
2025-12-04T12:13:00.1102346Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml
2025-12-04T12:13:00.1482169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml
2025-12-04T12:13:00.1860805Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml
2025-12-04T12:13:00.2414334Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml
2025-12-04T12:13:00.2843419Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml
2025-12-04T12:13:00.3168467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml
2025-12-04T12:13:00.3485332Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml
2025-12-04T12:13:00.3820983Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml
2025-12-04T12:13:00.4175809Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml
2025-12-04T12:13:00.4496378Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml
2025-12-04T12:13:00.4861464Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml
2025-12-04T12:13:00.5275608Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml
2025-12-04T12:13:00.5603822Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml
2025-12-04T12:13:00.5975109Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml
2025-12-04T12:13:00.6380922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml
2025-12-04T12:13:00.6722004Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml
2025-12-04T12:13:00.7127716Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml
2025-12-04T12:13:00.7454768Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml
2025-12-04T12:13:00.7735799Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml
2025-12-04T12:13:00.8151761Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml
2025-12-04T12:13:00.8461860Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml
2025-12-04T12:13:00.8739945Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml
2025-12-04T12:13:00.9241707Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml
2025-12-04T12:13:00.9553714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml
2025-12-04T12:13:00.9867032Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml
2025-12-04T12:13:01.0211932Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml
2025-12-04T12:13:01.0511922Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml
2025-12-04T12:13:01.1011373Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml
2025-12-04T12:13:01.1332869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml
2025-12-04T12:13:01.1824730Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml
2025-12-04T12:13:01.2231670Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml
2025-12-04T12:13:01.2762563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml
2025-12-04T12:13:01.3630045Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml
2025-12-04T12:13:01.3944517Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml
2025-12-04T12:13:01.4395226Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml
2025-12-04T12:13:01.4713462Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml
2025-12-04T12:13:01.4992285Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml
2025-12-04T12:13:01.5296566Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml
2025-12-04T12:13:01.5621502Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml
2025-12-04T12:13:01.5934944Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml
2025-12-04T12:13:01.6283758Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml
2025-12-04T12:13:01.6570511Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml
2025-12-04T12:13:01.6938916Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml
2025-12-04T12:13:01.7475582Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml
2025-12-04T12:13:01.7938971Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml
2025-12-04T12:13:01.8431146Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml
2025-12-04T12:13:01.8724552Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml
2025-12-04T12:13:01.9046015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml
2025-12-04T12:13:01.9422556Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml
2025-12-04T12:13:01.9756303Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml
2025-12-04T12:13:02.0029015Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml
2025-12-04T12:13:02.0769461Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml
2025-12-04T12:13:02.1297786Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml
2025-12-04T12:13:02.1634029Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml
2025-12-04T12:13:02.1918441Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml
2025-12-04T12:13:02.2236278Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml
2025-12-04T12:13:02.2607697Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml
2025-12-04T12:13:02.3076002Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml
2025-12-04T12:13:02.3646059Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml
2025-12-04T12:13:02.3936256Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml
2025-12-04T12:13:02.4339353Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml
2025-12-04T12:13:02.9163818Z Uploading logs for 57119749427 to S3
2025-12-04T12:13:03.0374297Z Uploading artifacts took 0.57 seconds
2025-12-04T12:13:03.0374934Z inductor/test_mix_order_reduction 1/2 failed!
2025-12-04T12:13:03.0378699Z Running test_transformers 1/1 ... [2025-12-04 12:13:03.037692][10740.647598395]
2025-12-04T12:13:03.0379235Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:13:03.0383715Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_transformers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:13:03.038144]
2025-12-04T12:14:02.7418367Z 
2025-12-04T12:14:02.7419598Z test_transformers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_transformers_1.1_cd619bbaee31992c_.log
2025-12-04T12:14:03.7592680Z Running 10091 items in this shard: test/test_transformers.py::TestTransformersCUDA::test_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_only_layer_cuda, test/test_transformers.py::TestTransformersCUDA::test_decoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_disable_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_encoder_padding_and_src_mask_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_is_causal_gpu_cuda, test/test_transformers.py::TestTransformersCUDA::test_kpm_mask_trailing_column_with_nested_tensor_cuda, test/test_transformers.py::TestTransformersCUDA::test_mask_check_fastpath_cuda, test/test_transformers.py::TestTransformersCUDA::test_math_backend_high_precision_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_1_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_mha_native_args_nb_heads_8_bias_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim1_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_bool_cuda, test/test_transformers.py::TestTransformersCUDA::test_multiheadattention_fastpath_attn_mask_attn_mask_dim_3_key_padding_mask_dim_2_float32_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_0_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_3D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_3D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_2D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_4D_causal_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_0_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_2_cuda, test/test_transformers.py::TestTransformersCUDA::test_scaled_dot_product_attention_batch_size_5_4D_input_dim_no_attn_mask_dropout_p_0_5_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_encoder_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_script_mha_in_proj_weight_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_self_attn_TxT_attn_mask_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_is_causal_cuda, test/test_transformers.py::TestTransformersCUDA::test_train_with_pad_and_catch_error_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformer_bias_is_none_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_False_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_False_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_batch_first_True_training_True_enable_nested_tensor_True_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_False_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_False_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_12_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_fastpath_use_torchscript_False_enable_nested_tensor_True_use_autocast_True_d_model_256_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_False_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_False_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoder_square_input_with_no_grad_True_training_True_enable_nested_tensor_False_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_3_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_no_fastpath_with_hooks_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_1_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_4_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_src_mask_nhead_8_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_cuda, test/test_transformers.py::TestTransformersCUDA::test_transformerencoderlayer_subclass_model_cuda, test/test_transformers.py::TestTransformersCUDA::test_with_nested_tensor_input_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_dispatch_fails_no_backend_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_atteention_large_bf16_nan_values_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_attention_fail_with_non_square_causal_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_bfloat16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_autocast_fp32_float16_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_193_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_backward_failure_sm86plus_head_dim_256_dropout_p_0_2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_flash_fail_fp32_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_error_cases_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_nested_broadcasting_requires_grad_failure_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_fused_kernels_seq_len_0_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_attn_mask_present_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_broadcast_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_dim_3_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_head_dim_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_fused_inputs_invalid_dtype_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_1_dimensional_inputs_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_datatypes_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_inputs_different_devices_kernel2_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sdpa_kernel_grouped_query_attention_cuda_fused_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_invalid_sequence_lengths_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel0_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mask_invalid_last_dim_stride_kernel1_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_fail_with_batch_size_geq_65536_error_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_eff_attention_large_seq_len_uniform_attention_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_mem_efficient_fail_bfloat16_less_than_sm80_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_nested_fails_on_padding_head_dim_cuda, test/test_transformers.py::TestSDPAFailureModesCUDA::test_unaligned_tensors_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_fp16_overflow_cuda, test/test_transformers.py::TestSDPACUDA::test_scaled_dot_product_attention_math_with_negative_scale_kernel0_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_False_cuda, test/test_transformers.py::TestSDPACUDA::test_sdp_math_gradcheck_contiguous_inputs_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_broken_166211_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_compiles_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_d256_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_fail_d128_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_gqa_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_nonmodulo64seqlen_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_preserves_query_layout_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_seqlen1_dropout_heuristic_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_cudnn_attention_trivial_output_transpose_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_2048_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_127_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_2048_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_203_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_256_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_False_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale0_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_False_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads0_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_bfloat16_scale_l1_enable_gqa_True_n_heads1_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale0_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_False_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_579_head_dim_8_is_causal_True_dropout_p_0_48_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_32_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_256_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_256_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_64_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_0_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale0_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_False_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_flash_attention_vs_math_ref_grads_nestedtensor_batch_size_8_max_seq_len_q_32_max_seq_len_kv_32_head_dim_8_dropout_p_0_1_float16_scale_l1_is_causal_True_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_different_dk_dv_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_1024_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_1024_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_attention_vs_math_ref_grads_cudagraph_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_fused_kernel0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_backwards_throws_determinism_warning_fused_kernel0_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_False_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_False_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_False_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_False_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_False_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_kernel0_expand_q_batch_True_expand_k_batch_True_expand_v_batch_True_expand_q_num_heads_True_expand_k_num_heads_True_expand_v_num_heads_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_nested_broadcasting_query_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_kernels_seq_len_1_inputs_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_dense_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_choice_type_nested_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_fused_sdp_priority_order_use_compile_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_long_sequence_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contig_mask_bug_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_attention_non_contiguous_mask_float32_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_eff_backwards_determinism_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_152_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_37_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_1_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_2_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_3_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_mask_variants_mask_dim_4_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_1_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_256_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_4_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_512_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_256_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_4_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_512_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_16_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_32_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_64_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_False_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_0_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale0_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float16_scale_l1_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale0_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_mem_efficient_attention_vs_math_ref_grads_batch_size_8_seq_len_q_8_seq_len_k_8_head_dim_8_is_causal_True_dropout_p_0_22_float32_scale_l1_cuda_float32, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_cudnn_nested_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_dense_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_accuracy_type_nested_fused_kernel0_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_dense_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_scaled_dot_product_attention_fused_kernels_packed_type_nested_is_contiguous_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_choice_with_determinism_warn_only_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_False_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_False_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_bfloat16_cuda_bfloat16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_flash_attention_grad_against_math_contiguous_inputs_True_is_causal_True_float16_cuda_float16, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_False_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_False_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_sdp_mem_efficient_grad_against_math_contiguous_inputs_True_is_causal_True_cuda, test/test_transformers.py::TestSDPACudaOnlyCUDA::test_singelton_head_dim_stride_ne_1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_causal_variants_compile_causal_variant_CausalVariant_UPPER_LEFT_shape3_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_and_mask_fails_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape0_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape1_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape2_cuda, test/test_transformers.py::TestAttnBiasCUDA::test_is_causal_equals_upper_left_shape3_cuda
﻿2025-12-04T12:14:04.7686107Z 
2025-12-04T12:14:04.7686457Z Finished test_transformers 1/1 ... [2025-12-04 12:14:02.768777][10800.378679093], took 1.00min
2025-12-04T12:14:04.7687630Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.xml
2025-12-04T12:14:04.7688704Z Running test_autograd 1/1 ... [2025-12-04 12:14:03.169345][10800.779250933]
2025-12-04T12:14:04.7689204Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:14:04.7690370Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autograd.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:14:03.169809]
2025-12-04T12:15:29.4088548Z 
2025-12-04T12:15:29.4089504Z test_autograd 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autograd_1.1_343bbb8e8e4f4e62_.log
2025-12-04T12:15:29.4349401Z Running 659 items in this shard: test/test_autograd.py::TestAutograd::test_access_saved_tensor_twice_without_recomputation_works, test/test_autograd.py::TestAutograd::test_accumulate_grad, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_can_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_accumulate_grad_posthooks_should_not_execute, test/test_autograd.py::TestAutograd::test_accumulate_grad_tensor_reference, test/test_autograd.py::TestAutograd::test_accumulate_grad_with_zero_numel_grad, test/test_autograd.py::TestAutograd::test_anomaly_assign_parent_cleanup, test/test_autograd.py::TestAutograd::test_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_anomaly_grad_warnings, test/test_autograd.py::TestAutograd::test_anomaly_mode_no_check_nan, test/test_autograd.py::TestAutograd::test_attribute_deletion, test/test_autograd.py::TestAutograd::test_autograd_inplace_view_of_view, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_creation_meta, test/test_autograd.py::TestAutograd::test_autograd_inplace_views_cross_dtype, test/test_autograd.py::TestAutograd::test_autograd_multiple_views_python, test/test_autograd.py::TestAutograd::test_autograd_node_isinstance, test/test_autograd.py::TestAutograd::test_autograd_print_tensor, test/test_autograd.py::TestAutograd::test_autograd_python_custom_function_inplace, test/test_autograd.py::TestAutograd::test_autograd_simple_views_python, test/test_autograd.py::TestAutograd::test_autograd_views_codegen, test/test_autograd.py::TestAutograd::test_backward, test/test_autograd.py::TestAutograd::test_backward_badcalls, test/test_autograd.py::TestAutograd::test_backward_copy, test/test_autograd.py::TestAutograd::test_backward_create_graph_warns, test/test_autograd.py::TestAutograd::test_backward_hook_relative_ordering, test/test_autograd.py::TestAutograd::test_backward_no_grad, test/test_autograd.py::TestAutograd::test_backward_to_node, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_retained_graph_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_with_saved_values, test/test_autograd.py::TestAutograd::test_backward_twice_without_saved_values, test/test_autograd.py::TestAutograd::test_backward_with_inputs, test/test_autograd.py::TestAutograd::test_backward_with_nonleaf_inputs, test/test_autograd.py::TestAutograd::test_backward_with_scalar_input, test/test_autograd.py::TestAutograd::test_calculate_shape_util, test/test_autograd.py::TestAutograd::test_callback_adds_callback, test/test_autograd.py::TestAutograd::test_callback_propagates_errors_from_device_thread, test/test_autograd.py::TestAutograd::test_cant_create_saved_tensors, test/test_autograd.py::TestAutograd::test_checkpoint_detects_non_determinism, test/test_autograd.py::TestAutograd::test_checkpoint_graph_execution_group, test/test_autograd.py::TestAutograd::test_checkpoint_sequential_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpoint_valid_reset_on_error, test/test_autograd.py::TestAutograd::test_checkpoint_warns_if_use_reentrant_not_passed_explcitly, test/test_autograd.py::TestAutograd::test_checkpointing, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_cpu, test/test_autograd.py::TestAutograd::test_checkpointing_non_reentrant_autocast_gpu, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_arbitrary_input_output, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_correct_grad, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_custom_function_works, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_dataparallel, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_detached_tensor_use_reentrant_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_False, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_input_requires_grad_True, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_memory_savings, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_parameter_used_in_an_out, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_saved_object_identity, test/test_autograd.py::TestAutograd::test_checkpointing_without_reentrant_with_context_fn, test/test_autograd.py::TestAutograd::test_copy_slices_graph_task_updates, test/test_autograd.py::TestAutograd::test_create_graph_and_full_backward_hook_cycle, test/test_autograd.py::TestAutograd::test_current_graph_task_execution_order, test/test_autograd.py::TestAutograd::test_current_graph_task_id, test/test_autograd.py::TestAutograd::test_current_node, test/test_autograd.py::TestAutograd::test_custom_autograd_ac_early_stop, test/test_autograd.py::TestAutograd::test_custom_autograd_no_early_free, test/test_autograd.py::TestAutograd::test_custom_autograd_repeated_grad_grad, test/test_autograd.py::TestAutograd::test_custom_function_cycle, test/test_autograd.py::TestAutograd::test_custom_function_error, test/test_autograd.py::TestAutograd::test_custom_function_exception, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_forward_is_no_op, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_inplace_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_non_tensor_before_tensor_args, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_view_checks, test/test_autograd.py::TestAutograd::test_custom_function_forward_mode_wrong_formula, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_non_default_view, test/test_autograd.py::TestAutograd::test_custom_function_inplace_on_view_of_leaf, test/test_autograd.py::TestAutograd::test_custom_function_local_inplace, test/test_autograd.py::TestAutograd::test_custom_function_mark_dirty_not_differentiable, test/test_autograd.py::TestAutograd::test_custom_function_mark_output_view_of_intermediate, test/test_autograd.py::TestAutograd::test_custom_function_no_tensors, test/test_autograd.py::TestAutograd::test_custom_function_non_tensor_inputs_outputs, test/test_autograd.py::TestAutograd::test_custom_function_preserve_torch_function_when_return_as_is, test/test_autograd.py::TestAutograd::test_custom_function_return_view_in_nograd, test/test_autograd.py::TestAutograd::test_custom_function_save_for_forward, test/test_autograd.py::TestAutograd::test_custom_function_saved_tensors, test/test_autograd.py::TestAutograd::test_custom_function_saving_mutated_view_no_leak, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_input, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_multi_output, test/test_autograd.py::TestAutograd::test_custom_function_setup_context_simple, test/test_autograd.py::TestAutograd::test_custom_function_vmap_defaults, test/test_autograd.py::TestAutograd::test_deep_reentrant, test/test_autograd.py::TestAutograd::test_default_saved_tensors_hooks_double_backward, test/test_autograd.py::TestAutograd::test_dep_nograd, test/test_autograd.py::TestAutograd::test_dependent_backward, test/test_autograd.py::TestAutograd::test_detach, test/test_autograd.py::TestAutograd::test_detach_base, test/test_autograd.py::TestAutograd::test_detach_then_inplace_raises_in_autograd, test/test_autograd.py::TestAutograd::test_diagonal_expanded_v, test/test_autograd.py::TestAutograd::test_dir, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks, test/test_autograd.py::TestAutograd::test_disabling_saved_tensor_hooks_nested, test/test_autograd.py::TestAutograd::test_dont_materialize_grads, test/test_autograd.py::TestAutograd::test_duplicate_backward_root, test/test_autograd.py::TestAutograd::test_enable_grad_decorator_no_paren, test/test_autograd.py::TestAutograd::test_first_grad_fn_access_in_no_grad_mode, test/test_autograd.py::TestAutograd::test_free_deep_graph, test/test_autograd.py::TestAutograd::test_free_deep_graph_complicated, test/test_autograd.py::TestAutograd::test_free_deep_graph_pyfunction, test/test_autograd.py::TestAutograd::test_full_backward_hook_double_backward, test/test_autograd.py::TestAutograd::test_function, test/test_autograd.py::TestAutograd::test_function_returns_input, test/test_autograd.py::TestAutograd::test_function_returns_undefined_tensor, test/test_autograd.py::TestAutograd::test_gc_in_destructor, test/test_autograd.py::TestAutograd::test_get_data_and_hooks_from_raw_saved_variable, test/test_autograd.py::TestAutograd::test_grad, test/test_autograd.py::TestAutograd::test_grad_badcalls, test/test_autograd.py::TestAutograd::test_grad_batched_grad, test/test_autograd.py::TestAutograd::test_grad_dtype, test/test_autograd.py::TestAutograd::test_grad_empty_inputs, test/test_autograd.py::TestAutograd::test_grad_fn_attr_bindings, test/test_autograd.py::TestAutograd::test_grad_fn_badcalls, test/test_autograd.py::TestAutograd::test_grad_fn_input_metadata, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_multiple_outputs, test/test_autograd.py::TestAutograd::test_grad_fn_prehooks_remove_hooks, test/test_autograd.py::TestAutograd::test_grad_materialize_grads, test/test_autograd.py::TestAutograd::test_grad_mode_class_decoration, test/test_autograd.py::TestAutograd::test_grad_mode_restored_reentrant, test/test_autograd.py::TestAutograd::test_grad_nonleaf, test/test_autograd.py::TestAutograd::test_grad_nonleaf_many_outputs, test/test_autograd.py::TestAutograd::test_grad_nonleaf_register_hook, test/test_autograd.py::TestAutograd::test_grad_thread_safety, test/test_autograd.py::TestAutograd::test_grad_to_node, test/test_autograd.py::TestAutograd::test_grad_to_node_inplace, test/test_autograd.py::TestAutograd::test_grad_to_node_materialize, test/test_autograd.py::TestAutograd::test_grad_to_node_multi, test/test_autograd.py::TestAutograd::test_grad_to_node_set, test/test_autograd.py::TestAutograd::test_grad_unreachable, test/test_autograd.py::TestAutograd::test_grad_unreachable_discovery, test/test_autograd.py::TestAutograd::test_gradcheck_backward_mul_by_grad_output, test/test_autograd.py::TestAutograd::test_gradcheck_check_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_check_forward_or_backward_only, test/test_autograd.py::TestAutograd::test_gradcheck_check_no_differentiable_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_complex_non_complex_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_custom_error, test/test_autograd.py::TestAutograd::test_gradcheck_default_device_placement_context, test/test_autograd.py::TestAutograd::test_gradcheck_dense_and_sparse_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_batched_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_respects_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_forward_ad_runs_with_no_requires_grad, test/test_autograd.py::TestAutograd::test_gradcheck_get_analytical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_get_numerical_jacobian, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout0, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout1, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout2, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout3, test/test_autograd.py::TestAutograd::test_gradcheck_input_layout4, test/test_autograd.py::TestAutograd::test_gradcheck_jacobian_mismatch, test/test_autograd.py::TestAutograd::test_gradcheck_multiple_mkldnn_inputs, test/test_autograd.py::TestAutograd::test_gradcheck_nondeterministic, test/test_autograd.py::TestAutograd::test_gradcheck_output_shape_or_dtype_depend_on_values, test/test_autograd.py::TestAutograd::test_gradcheck_single_input, test/test_autograd.py::TestAutograd::test_gradcheck_test_outputs, test/test_autograd.py::TestAutograd::test_gradcheck_undefined_grad, test/test_autograd.py::TestAutograd::test_gradcheck_validates_input_mkldnn, test/test_autograd.py::TestAutograd::test_gradcheck_validates_inputs, test/test_autograd.py::TestAutograd::test_gradient_edge_graph_ownership, test/test_autograd.py::TestAutograd::test_gradient_edge_output, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu, test/test_autograd.py::TestAutograd::test_graph_save_on_cpu_cuda, test/test_autograd.py::TestAutograd::test_hessian_vector, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_False_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_False, test/test_autograd.py::TestAutograd::test_hook_closure_cycle_use_custom_function_True_use_tensor_hook_True, test/test_autograd.py::TestAutograd::test_hook_edge_case_when_called_with_grad, test/test_autograd.py::TestAutograd::test_hook_none, test/test_autograd.py::TestAutograd::test_hook_with_no_name, test/test_autograd.py::TestAutograd::test_hooks, test/test_autograd.py::TestAutograd::test_hooks_cpp, test/test_autograd.py::TestAutograd::test_increment_version, test/test_autograd.py::TestAutograd::test_index_backward_does_not_save_tensor, test/test_autograd.py::TestAutograd::test_indexing, test/test_autograd.py::TestAutograd::test_indexing_duplicates, test/test_autograd.py::TestAutograd::test_inplace, test/test_autograd.py::TestAutograd::test_inplace_not_requires_grad, test/test_autograd.py::TestAutograd::test_inplace_on_view_backward, test/test_autograd.py::TestAutograd::test_inplace_on_view_leaf_errors, test/test_autograd.py::TestAutograd::test_inplace_on_view_saved_output, test/test_autograd.py::TestAutograd::test_inplace_on_view_weak_grad_fn, test/test_autograd.py::TestAutograd::test_input_buffer_accum, test/test_autograd.py::TestAutograd::test_integer_outputs, test/test_autograd.py::TestAutograd::test_invalid_gradients, test/test_autograd.py::TestAutograd::test_isolated_node, test/test_autograd.py::TestAutograd::test_leaf_assignment, test/test_autograd.py::TestAutograd::test_legacy_function_deprecation_exception, test/test_autograd.py::TestAutograd::test_lobpcg, test/test_autograd.py::TestAutograd::test_mark_non_differentiable, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_mixed, test/test_autograd.py::TestAutograd::test_mark_non_differentiable_none, test/test_autograd.py::TestAutograd::test_materialize_grads, test/test_autograd.py::TestAutograd::test_multi_backward, test/test_autograd.py::TestAutograd::test_multi_backward_no_grad, test/test_autograd.py::TestAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestAutograd::test_multi_grad_hooks_invalid_mode, test/test_autograd.py::TestAutograd::test_multiple_insert_removal_caching, test/test_autograd.py::TestAutograd::test_named_tensor_for_complex_views, test/test_autograd.py::TestAutograd::test_naughty_anomaly_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_attribute_access, test/test_autograd.py::TestAutograd::test_naughty_autograd_function_stashing_ctx, test/test_autograd.py::TestAutograd::test_nested_anomaly_detect_nan, test/test_autograd.py::TestAutograd::test_nested_anomaly_printstack_cleanup, test/test_autograd.py::TestAutograd::test_next_functions, test/test_autograd.py::TestAutograd::test_no_grad, test/test_autograd.py::TestAutograd::test_no_grad_assignment, test/test_autograd.py::TestAutograd::test_no_grad_copy, test/test_autograd.py::TestAutograd::test_no_grad_copy_sparse, test/test_autograd.py::TestAutograd::test_no_grad_input, test/test_autograd.py::TestAutograd::test_no_grad_modifies_version, test/test_autograd.py::TestAutograd::test_no_grad_python_function, test/test_autograd.py::TestAutograd::test_no_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_no_unnecessary_save, test/test_autograd.py::TestAutograd::test_no_unnecessary_unwrapping, test/test_autograd.py::TestAutograd::test_node_ordering_when_none_returned, test/test_autograd.py::TestAutograd::test_node_post_hook_registered_during_unpack_hook, test/test_autograd.py::TestAutograd::test_not_implemented_fwad, test/test_autograd.py::TestAutograd::test_not_implemented_grad, test/test_autograd.py::TestAutograd::test_numpy_requires_grad, test/test_autograd.py::TestAutograd::test_once_differentiable, test/test_autograd.py::TestAutograd::test_out_variant_raises_when_inputs_require_grad, test/test_autograd.py::TestAutograd::test_pack_hook_with_inplace_modification_should_fail, test/test_autograd.py::TestAutograd::test_pickle, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_e2e, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_gets_cleaned_up, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_hooks, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_multiple_tensors, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_on_non_leaf, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_ordering, test/test_autograd.py::TestAutograd::test_post_accumulate_grad_hook_returns_not_None, test/test_autograd.py::TestAutograd::test_pow_zero_tensor_gradient, test/test_autograd.py::TestAutograd::test_power_function, test/test_autograd.py::TestAutograd::test_prehook_ordering, test/test_autograd.py::TestAutograd::test_profiler, test/test_autograd.py::TestAutograd::test_profiler_aggregation_fake, test/test_autograd.py::TestAutograd::test_profiler_aggregation_lstm, test/test_autograd.py::TestAutograd::test_profiler_aggregation_table, test/test_autograd.py::TestAutograd::test_profiler_function_event_avg, test/test_autograd.py::TestAutograd::test_profiler_propagation, test/test_autograd.py::TestAutograd::test_profiler_seq_nr, test/test_autograd.py::TestAutograd::test_profiler_shapes, test/test_autograd.py::TestAutograd::test_profiler_unboxed_only, test/test_autograd.py::TestAutograd::test_pynode_destruction_deadlock, test/test_autograd.py::TestAutograd::test_record_function, test/test_autograd.py::TestAutograd::test_record_function_callbacks, test/test_autograd.py::TestAutograd::test_record_function_legacy, test/test_autograd.py::TestAutograd::test_record_function_multithreaded, test/test_autograd.py::TestAutograd::test_reentrant_child_error, test/test_autograd.py::TestAutograd::test_reentrant_priority, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_both_depths, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_0, test/test_autograd.py::TestAutograd::test_reentrant_with_callbacks_depth_1, test/test_autograd.py::TestAutograd::test_reentrant_with_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_reentrant_with_non_leaf_variable_hook, test/test_autograd.py::TestAutograd::test_requires_grad, test/test_autograd.py::TestAutograd::test_requires_grad_, test/test_autograd.py::TestAutograd::test_requires_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad, test/test_autograd.py::TestAutograd::test_retain_grad_cycle, test/test_autograd.py::TestAutograd::test_retain_grad_inplace, test/test_autograd.py::TestAutograd::test_retain_grad_inplace_over_view, test/test_autograd.py::TestAutograd::test_retains_grad_can_always_observe_tensor_prehook, test/test_autograd.py::TestAutograd::test_retains_grad_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_return_duplicate, test/test_autograd.py::TestAutograd::test_return_duplicate_inplace, test/test_autograd.py::TestAutograd::test_return_leaf, test/test_autograd.py::TestAutograd::test_return_leaf_inplace, test/test_autograd.py::TestAutograd::test_save_none_for_backward, test/test_autograd.py::TestAutograd::test_save_on_cpu_and_checkpoint, test/test_autograd.py::TestAutograd::test_save_output_nr, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_error_propagation, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_custom_function_intermediates, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_enter_during_bw_no_leak, test/test_autograd.py::TestAutograd::test_saved_tensor_hooks_extra_exit_during_bw_no_crash, test/test_autograd.py::TestAutograd::test_saved_tensors_hook_version_counter_not_shared, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_did_not_save_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_default_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_packing_unpacking_saved_original_with_hooks, test/test_autograd.py::TestAutograd::test_saved_variable_saved_original_inplace_detach, test/test_autograd.py::TestAutograd::test_saved_variable_version_counter, test/test_autograd.py::TestAutograd::test_saved_variables_deprecated, test/test_autograd.py::TestAutograd::test_saving_variable_to_disk, test/test_autograd.py::TestAutograd::test_scalar_grad_mixed_device, test/test_autograd.py::TestAutograd::test_select_expanded_v, test/test_autograd.py::TestAutograd::test_select_sum, test/test_autograd.py::TestAutograd::test_set_data_preserve_pyobj, test/test_autograd.py::TestAutograd::test_set_data_self_requires_grad, test/test_autograd.py::TestAutograd::test_set_data_tensorimpl_type, test/test_autograd.py::TestAutograd::test_set_grad_coroutines, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_benign_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_critical_exceptions, test/test_autograd.py::TestAutograd::test_set_grad_coroutines_exit, test/test_autograd.py::TestAutograd::test_set_grad_enabled, test/test_autograd.py::TestAutograd::test_set_grad_enabled_wraps, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions, test/test_autograd.py::TestAutograd::test_set_grad_generator_functions_recursive, test/test_autograd.py::TestAutograd::test_set_materialize_non_diff_grads, test/test_autograd.py::TestAutograd::test_setitem, test/test_autograd.py::TestAutograd::test_setitem_mask, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_not_fail, test/test_autograd.py::TestAutograd::test_setting_default_saved_variable_hooks_twice_should_use_inner, test/test_autograd.py::TestAutograd::test_setup_context_when_forward_has_default_args, test/test_autograd.py::TestAutograd::test_shape, test/test_autograd.py::TestAutograd::test_sharded_grad, test/test_autograd.py::TestAutograd::test_simple_reentrant, test/test_autograd.py::TestAutograd::test_slice_expanded_v, test/test_autograd.py::TestAutograd::test_sparse_gather_both_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_dim0, test/test_autograd.py::TestAutograd::test_sparse_gather_dim1, test/test_autograd.py::TestAutograd::test_sparse_gather_dim_neg, test/test_autograd.py::TestAutograd::test_sparse_gather_ind_scalar, test/test_autograd.py::TestAutograd::test_sparse_gather_x_scalar, test/test_autograd.py::TestAutograd::test_sparse_mm_backward, test/test_autograd.py::TestAutograd::test_tensor_grad_warnings, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_multiple_outputs, test/test_autograd.py::TestAutograd::test_tensor_hooks_inplace_over_view, test/test_autograd.py::TestAutograd::test_thread_shutdown, test/test_autograd.py::TestAutograd::test_to_sparse_backward, test/test_autograd.py::TestAutograd::test_too_many_grads, test/test_autograd.py::TestAutograd::test_type_conversions, test/test_autograd.py::TestAutograd::test_unpack_hooks_exec_count, test/test_autograd.py::TestAutograd::test_unrelated_inputs, test/test_autograd.py::TestAutograd::test_unsafe_set_version_counter, test/test_autograd.py::TestAutograd::test_unused_grad_requires_grad_with_materialize, test/test_autograd.py::TestAutograd::test_unused_output, test/test_autograd.py::TestAutograd::test_var_mean_differentiable, test/test_autograd.py::TestAutograd::test_variable_traverse, test/test_autograd.py::TestAutograd::test_version_counter, test/test_autograd.py::TestAutograd::test_view_func_replay, test/test_autograd.py::TestAutograd::test_view_func_replay_with_modified_state, test/test_autograd.py::TestAutograd::test_view_replay_enabled, test/test_autograd.py::TestAutograd::test_volatile_deprecated, test/test_autograd.py::TestAutograd::test_will_engine_execute_node, test/test_autograd.py::TestAutograd::test_wrapped_number_saved_tensors_hooks, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_not_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_inplace_on_view_same_layout, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_metadata_check_for_storage_numel_skipped, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_basic, test/test_autograd.py::TestAutogradForwardModeBatchedGrad::test_out_of_place_not_same_layout, test/test_autograd.py::TestAutogradForwardMode::test_advanced_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_backward_graph_destruction, test/test_autograd.py::TestAutogradForwardMode::test_basic_packing_unpacking, test/test_autograd.py::TestAutogradForwardMode::test_codegen_ignores_undefined_outputs, test/test_autograd.py::TestAutogradForwardMode::test_create_new_zeros_with_same_meta, test/test_autograd.py::TestAutogradForwardMode::test_default_level, test/test_autograd.py::TestAutogradForwardMode::test_detach_view_tracking, test/test_autograd.py::TestAutogradForwardMode::test_forward_level_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_grad_cleanup, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_forbid_integral_dtype, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_inference_tensor_in_inference_mode, test/test_autograd.py::TestAutogradForwardMode::test_make_dual_torch_dispatch, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_check_conj, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_ignores_size_zero, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_checks_storage_numel, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_ignore_storage_offset_for_zero_numel_tensor, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_conj_bit, test/test_autograd.py::TestAutogradForwardMode::test_metadata_check_when_primal_has_neg_bit, test/test_autograd.py::TestAutogradForwardMode::test_nested_level, test/test_autograd.py::TestAutogradForwardMode::test_non_differentiable, test/test_autograd.py::TestAutogradForwardMode::test_out_variant, test/test_autograd.py::TestAutogradForwardMode::test_print, test/test_autograd.py::TestAutogradForwardMode::test_set_fw_grad_having_own_fw_grad_at_same_level, test/test_autograd.py::TestAutogradForwardMode::test_set_fwd_grad_enabled, test/test_autograd.py::TestAutogradForwardMode::test_size_check, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_always_creates_a_view, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_differentiable_views, test/test_autograd.py::TestAutogradForwardMode::test_view_inplace_non_differentiable_views, test/test_autograd.py::TestAllowMutationOnSaved::test_backward_out_of_context, test/test_autograd.py::TestAllowMutationOnSaved::test_basic, test/test_autograd.py::TestAllowMutationOnSaved::test_disallow_nesting, test/test_autograd.py::TestAllowMutationOnSaved::test_double_backward, test/test_autograd.py::TestAllowMutationOnSaved::test_inplace_foreach, test/test_autograd.py::TestAllowMutationOnSaved::test_save_base_and_modify_view, test/test_autograd.py::TestAllowMutationOnSaved::test_save_view_modify_base, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_but_not_anymore, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_different_versions, test/test_autograd.py::TestAllowMutationOnSaved::test_saved_same_tensor_many_times, test/test_autograd.py::TestAllowMutationOnSaved::test_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_math_views, test/test_autograd.py::TestAllowMutationOnSaved::test_with_out_variant, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_context_manager, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_decorator, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_existing_autograd_session, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_direct_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_handle_indirect_view_on_rebase, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_inf_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_inf_tensor_in_normal_mode_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_inference_mode_tensor_creation, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_functional_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_inplace_op, test/test_autograd.py::TestAutogradInferenceMode::test_mix_inference_and_normal_tensor_view_op, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_inplace_output_in_normal_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_inference_mode, test/test_autograd.py::TestAutogradInferenceMode::test_normal_tensor_view_output_in_normal_mode, test/test_autograd.py::TestMultithreadAutograd::test_cat_stack_r_to_c, test/test_autograd.py::TestMultithreadAutograd::test_custom_function_propagates_errors_from_device_thread, test/test_autograd.py::TestMultithreadAutograd::test_dataparallel_saved_tensors_hooks, test/test_autograd.py::TestMultithreadAutograd::test_fork_join_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_all_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multi_grad_any_hooks, test/test_autograd.py::TestMultithreadAutograd::test_multithreaded_exception_propagation, test/test_autograd.py::TestMultithreadAutograd::test_preserve_backtrace, test/test_autograd.py::TestMultithreadAutograd::test_python_thread_in_middle, test/test_autograd.py::TestMultithreadAutograd::test_set_multithreading_enabled_as_context_manager_and_function, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward, test/test_autograd.py::TestMultithreadAutograd::test_simple_backward_same_input, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_kwargs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_non_tensor_inputs_and_outputs_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_reentrant_backwards_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_same_graph_early_stop_True, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_set_early_stop_no_recompution_needed, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_False, test/test_autograd.py::TestNestedCheckpoint::test_nested_checkpoint_two_children_early_stop_True, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_bad_inputs, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_can_only_trigger_recompute_once, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_flops_and_mem, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_more_than_one_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_function_with_non_tensor_output, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_output_already_has_autograd_meta, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_policy_with_state, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_storage_lifetime, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_subclass_dispatching_sizes, test/test_autograd.py::TestSelectiveActivationCheckpoint::test_version_counter, test/test_autograd.py::TestAutogradComplex::test_view_func_for_complex_views, test/test_autograd.py::TestAutogradComplex::test_view_with_multi_output, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_cuda_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_construct_standard_basis_for_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_match_vhp_hvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_scalar_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hessian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_hvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_create_graph_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_strict_vectorize_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_False_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_err_check_vectorize_True_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_match_vjp_jvp_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_output_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_scalar_vectorized_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_devices_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_different_dtype_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_multi_input_multi_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_simple_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_unrelated_outputs_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_correctness_zero_dim_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jacobian_vectorize_raises_no_warnings_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_jvp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vhp_scalar_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_create_graph_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_err_check_strict_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_no_grad_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_output_logging_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_base_tensor, test/test_autograd.py::TestAutogradFunctional::test_vjp_scalar_logging_tensor, test/test_autograd.py::TestAutogradLogging::test_logging, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_large_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_advanced_indexing_backwards_memory_format_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_backward_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_complex_scalar_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy__cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_broadcasting_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_forward_ad_same_layout_copies_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_copy_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_cross_device_reentrant_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_free_unneeded_tensor_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_grad_assignment_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_gradcheck_input_output_different_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_multiple_output_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_backprop_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_gradcheck_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_makes_base_require_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_modify_base_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_safe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multi_output_unsafe_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_multiple_outputs_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_non_contig_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_multiple_output_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_of_view_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_python_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_then_no_grad_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inplace_on_view_undefined_grad_output_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_inputbuffer_add_multidevice_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_min_max_median_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_mv_grad_stride_0_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_non_differentiable_ops_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_parameter_resize_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pin_memory_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pow_real_negative_base_complex_exponent_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_itt_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_profiler_emit_nvtx_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_pyscalar_conversions_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_reentrant_parent_error_on_cpu_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_requires_grad_factory_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_resize_version_bump_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_rnn_backward_to_input_but_not_parameters_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_amin_amax_backprops_to_all_values_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_scatter_index_reduce_prod_gradgrad_error_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int16, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int32, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_set_requires_grad_only_for_floats_cuda_int8, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_simple_reentrant_cross_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_complex128, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_ctor_getter_backward_cuda_float64, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_sparse_mask_autograd_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_strided_leaf_grad_layout_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_to_r_to_c_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_unused_output_device_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_warning_in_backward_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_functional_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_where_scalar_cuda, test/test_autograd.py::TestAutogradDeviceTypeCUDA::test_zero_dim_param_mixed_device_grad_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_atan2_zero_gradient_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_composite_implicit_and_dispatch_registration_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_autograd_multiple_dispatch_registrations_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_single_threaded_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_backward_tls_stash_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_foward_mode_AD_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_is_retain_graph_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_per_dispatch_key_input_saving_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_set_sequence_nr_cuda, test/test_autograd.py::TestAutogradMultipleDispatchCUDA::test_view_copy_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_multi_producer_case_4_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_2_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_3_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_3_correctness_non_default_ambient_stream_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_consumer_to_single_producer_case_4_correctness_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_side_stream_backward_overlap_cuda, test/test_autograd.py::TestAutogradStreamSynchronizationCUDA::test_warn_on_accumulate_grad_stream_mismatch_flag_cuda
2025-12-04T12:15:29.4605023Z 
2025-12-04T12:15:29.4605360Z Finished test_autograd 1/1 ... [2025-12-04 12:15:29.409626][10887.019532931], took 1.44min
2025-12-04T12:15:29.4606446Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.xml
2025-12-04T12:15:29.5275811Z Running test_sparse 1/2 ... [2025-12-04 12:15:29.527270][10887.137177326]
2025-12-04T12:15:29.5276536Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:15:29.5279842Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse.py', '--shard-id=1', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:15:29.527688]
2025-12-04T12:20:11.2431182Z 
2025-12-04T12:20:11.2432092Z test_sparse 1/2 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_1.2_170c4a4cb63931fe_.log
2025-12-04T12:20:11.3072714Z Running 1525 items in this shard: test/test_sparse.py::TestSparseOneOff::test_cuda_from_cpu, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_add_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_fake_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_meta_SparseCSR_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_print_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSC_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_sum_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_to_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseBSR_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_fake_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCOO_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSC_float64, test/test_sparse.py::TestSparseMeta::test_zeros_like_meta_SparseCSR_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_ceil_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_expm1_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_inplace_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_deg2rad_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isnan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isneginf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_isposinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sign_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_out_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_asinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isnan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sgn_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sign_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_signbit_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sinh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_tanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_consistency_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_conj_physical_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_neg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_positive_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_fn_grad_tan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asin_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atan_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_ceil_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_conj_physical_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_deg2rad_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_erfinv_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_expm1_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nan_to_num_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_nn_functional_relu_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sgn_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_sqrt_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_tanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zero_dims_trunc_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_abs_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_asinh_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_atanh_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_ceil_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_conj_physical_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_deg2rad_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_erfinv_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_expm1_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_floor_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_frac_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isnan_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isneginf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_isposinf_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_log1p_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nan_to_num_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_neg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_nn_functional_relu_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_positive_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_rad2deg_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_round_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sgn_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sign_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_signbit_cuda_int8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sin_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sinh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_complex64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_sqrt_cuda_uint8, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int16, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tan_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_complex128, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_float64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_tanh_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int32, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int64, test/test_sparse.py::TestSparseUnaryUfuncsCUDA::test_sparse_zeros_trunc_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amax_cuda_int8, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_amin_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_float64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_prod_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_bfloat16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_complex128, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_float32, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int16, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int64, test/test_sparse.py::TestSparseMaskedReductionsCUDA::test_future_empty_dim_masked_sum_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy__cuda_float64, test/test_sparse.py::TestSparseCUDA::test_Sparse_to_Sparse_copy_multi_gpu_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_dense_sparse_mismatch_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_noncontiguous_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_add_sub_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_add_zeros_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_any_cuda, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_asin_arcsin_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_assign_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_basic_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_basic_ops_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_bmm_deterministic_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_change_tensor_metadata_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_clone_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_coalesce_accepts_large_tensor_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_coalesce_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_coalesce_transpose_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_contig_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_ctor_size_checks_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_div_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_div_rounding_mode_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dsmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_dtypes_cuda, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_False_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_empty_full_requires_grad_True_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_empty_like_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_factory_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_nnz_zero_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_factory_size_check_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_factory_type_inference_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_floor_divide_by_sparse_error_cuda, test/test_sparse.py::TestSparseCUDA::test_index_select_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_exhaustive_index_large_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_index_select_parallelization_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_is_nonzero_cuda, test/test_sparse.py::TestSparseCUDA::test_is_sparse_cuda, test/test_sparse.py::TestSparseCUDA::test_legacy_new_device_cuda, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_int64, test/test_sparse.py::TestSparseCUDA::test_log1p_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_log_softmax_float_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_log_softmax_zero_nnz_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_mv_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_narrow_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_neg_negative_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_new_device_multi_gpu_cuda, test/test_sparse.py::TestSparseCUDA::test_norm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_permute_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_pickle_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_coalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_print_uncoalesced_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_resize_as_cuda, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_resize_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_saddmm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_scalar_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_select_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_select_no_type_promotion_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_shared_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_softmax_zero_nnz_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_coalesce_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_add_out_bfloat16_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_addmm_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_broadcast_to_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_sparse_dense_mul_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_backward_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_mask_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_sparse_matmul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_mm_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_sparse_mul_cuda_uint8, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int32, test/test_sparse.py::TestSparseCUDA::test_sparse_spdiags_cuda_int8, test/test_sparse.py::TestSparseCUDA::test_sparse_sum_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_sparse_to_numpy_cuda, test/test_sparse.py::TestSparseCUDA::test_sspaddmm_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_storage_not_null_cuda, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_bool, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_sum_cuda_int16, test/test_sparse.py::TestSparseCUDA::test_t_empty_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_masked_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_to_dense_hybrid_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_masked_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_complex64, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float16, test/test_sparse.py::TestSparseCUDA::test_to_dense_with_gradcheck_sparse_cuda_float32, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_bfloat16, test/test_sparse.py::TestSparseCUDA::test_to_sparse_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_transpose_cuda_float64, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_complex128, test/test_sparse.py::TestSparseCUDA::test_unsqueeze_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseBSR_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCOO_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_fast_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_masked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_as_sparse_gradcheck_SparseCSC_nonmasked_slow_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_binary_operation_mul_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_check_sparse_tensor_invariants_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_autograd_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_mismatched_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pin_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_constructor_pinned_memory_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_dataloader_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_generate_simple_inputs_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_masked_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseBSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCOO_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSC_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_masked_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_fast_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_mm_SparseCSR_sparse_slow_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseBSR_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_masked_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCOO_int64_sparse_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_masked_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSC_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_gradcheck_to_dense_SparseCSR_int64_sparse_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_randn_like_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSC_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_like_fns_zeros_like_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_method_pin_memory_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_backward_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSC_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_complex32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_reductions_sum_SparseCSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseBSR_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCOO_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSC_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_sparse_mask_SparseCSR_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_dense_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSC_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseBSR_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseBSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCOO_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseBSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCOO_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSC_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseBSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCOO_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int32_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_SparseCSR_SparseCSR_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_int8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSC_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseBSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCOO_int64_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSC_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_int64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int32_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bfloat16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_bool, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex128, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_complex64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float16, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_int32, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_Strided_SparseCSR_int64_cuda_uint8, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCOO_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_to_sparse_identity_SparseCSC_cuda_float64, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_ccol_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_coalesce_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_col_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_crow_indices_Strided_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_indices_SparseCSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_is_coalesced_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_row_indices_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSC_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseBSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCOO_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_SparseCSR_cuda, test/test_sparse.py::TestSparseAnyCUDA::test_unsupported_backend_error_message_values_Strided_cuda
2025-12-04T12:20:11.3699945Z 
2025-12-04T12:20:11.3700271Z Finished test_sparse 1/2 ... [2025-12-04 12:20:11.245129][11168.855034908], took 4.70min
2025-12-04T12:20:11.3701480Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.xml
2025-12-04T12:20:11.3878103Z Running test_decomp 2/17 ... [2025-12-04 12:20:11.387525][11168.997430279]
2025-12-04T12:20:11.3878614Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:20:11.3881961Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=2', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:20:11.387961]
2025-12-04T12:29:56.4209807Z 
2025-12-04T12:29:56.4210689Z test_decomp 2/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_2.17_4858d88ccf44ed88_.log
2025-12-04T12:29:56.4414942Z Running 535 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__unsafe_masked_index_put_accumulate_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_abs_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcdiv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_baddbmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cov_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumulative_trapezoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fliplr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_heaviside_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_item_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_kthvalue_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_power_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nan_to_num_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_layer_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv1d_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bilinear_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_kl_div_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool2d_grad_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multilabel_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_complex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_threshold_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_bilinear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_neg_3_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_searchsorted_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_gaussian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_kaiser_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signbit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_w_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_entr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_square_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_multiple_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_along_dim_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_topk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_real_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addcmul_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_asin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_clamp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_t_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_deg2rad_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_frac_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_i0_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_lcm_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_lerp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_lt_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i0e_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_multiple_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_LSTM_eval_mode_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_rnn_decomp_module_nn_RNN_train_mode_cuda_float32, test/test_decomp.py::HasDecompTest::test_aten_core_operators
2025-12-04T12:29:56.4612784Z 
2025-12-04T12:29:56.4613099Z Finished test_decomp 2/17 ... [2025-12-04 12:29:56.421665][11754.031569993], took 9.75min
2025-12-04T12:29:56.4614238Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.xml
2025-12-04T12:29:56.5374105Z Running test_decomp 7/17 ... [2025-12-04 12:29:56.537081][11754.146987533]
2025-12-04T12:29:56.5374648Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:29:56.5377444Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=7', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:29:56.537519]
2025-12-04T12:40:05.6661963Z 
2025-12-04T12:40:05.6663035Z test_decomp 7/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_7.17_ecdc7da48044ddba_.log
2025-12-04T12:40:05.6880707Z Running 583 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_T_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__batch_norm_with_update_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_lengths_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive__upsample_bilinear2d_aa_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_arange_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_not_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_or_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_right_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cauchy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chalf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_inverse_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clone_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_deg2rad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_digamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_trunc_rounding_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_equal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eye_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_floor_divide_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_geometric_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_histc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_int_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isclose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isnan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_istft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_le_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_solve_ex_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svdvals_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logaddexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logcumsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_xor_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_tensor_overload_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_cumsum_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_scatter_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_softmin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matmul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_binary_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_no_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mode_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_native_dropout_backward_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_ones_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_alpha_dropout_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose3d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_embedding_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_group_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardshrink_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardswish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_leaky_relu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_logsigmoid_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_nll_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_poisson_nll_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_silu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_upsample_nearest_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_remainder_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_sum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sigmoid_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_blackman_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_kaiser_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_zeta_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_unbiased_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_lowrank_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_take_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tensordot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_true_divide_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unique_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsafe_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_var_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zero__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__batch_norm_with_update_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_all_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_unsafe_split_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_xlogy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_cumprod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_dot_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_exp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_exponential_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_cross_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_huber_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_leaky_relu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_logsigmoid_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_unfold_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_number_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_0_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_neg_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_entr_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_erfcx_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_unbiased_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_tanh_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_tril_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int32, test/test_decomp.py::DecompOneOffTestsCUDA::test_exponential_non_inf_cuda
2025-12-04T12:40:05.7095774Z 
2025-12-04T12:40:05.7096100Z Finished test_decomp 7/17 ... [2025-12-04 12:40:05.666866][12363.276773316], took 10.15min
2025-12-04T12:40:05.7097140Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.xml
2025-12-04T12:40:07.0285465Z Uploading artifacts took 1.15 seconds
2025-12-04T12:40:07.0289201Z Running test_decomp 12/17 ... [2025-12-04 12:40:07.028730][12364.638637754]
2025-12-04T12:40:07.0289716Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:40:07.0294098Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=12', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:40:07.029164]
2025-12-04T12:50:28.0101285Z 
2025-12-04T12:50:28.0102325Z test_decomp 12/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_12.17_884069b3bca145fc_.log
2025-12-04T12:50:28.0300990Z Running 526 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rdiv___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmul___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rsub___cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive__native_batch_norm_legit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive__segment_reduce_offsets_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_add_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_aminmax_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_partial_views_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_3d_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bincount_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_block_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bool_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_tensors_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bucketize_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_byte_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cartesian_prod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_char_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cholesky_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_physical_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_constant_pad_nd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_corrcoef_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_count_nonzero_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumprod_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagonal_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diff_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dist_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfc_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erfinv_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fill_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flatten_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flipud_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fmin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ge_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gradient_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_grid_sampler_2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isreal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lcm_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cond_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_diagonal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eig_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_norm_subgradients_at_zero_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_hermitian_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_svd_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vecdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log10_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_and_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logspace_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mH_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_amax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_argmin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_fill_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logaddexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_select_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_sum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_var_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_matrix_exp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_binary_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_maximum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_list_of_tensors_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_meshgrid_variadic_tensors_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_movedim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_multinomial_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mv_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmedian_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nansum_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ne_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_batch_norm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_celu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_channel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv_transpose2d_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cosine_similarity_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_dropout3d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_bag_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_fractional_max_pool3d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_grid_sample_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardtanh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_area_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_l1_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_normalize_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_circular_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_constant_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_reflect_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_shuffle_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rms_norm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_selu_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softplus_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softsign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nonzero_static_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_fro_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_nuc_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_normal_number_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_outer_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polar_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_4_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_put_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_quantile_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randint_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_interleave_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize_as__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rot90_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_round_decimals_3_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsub_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scalar_tensor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amax_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sign_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_general_cosine_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_signal_windows_nuttall_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinc_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_scatter_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_airy_ai_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_t_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_u_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_h_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_hermite_polynomial_he_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1e_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_laguerre_polynomial_l_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_log_ndtr_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_k1_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sqrt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_squeeze_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_svd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tile_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_transpose_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triangular_solve_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trunc_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unflatten_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_xlogy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick__chunk_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_addmm_decomposed_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_addmv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_addr_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_asinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_atan2_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_atanh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_or_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_right_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_block_diag_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_clone_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_copysign_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_diag_embed_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_count_nonzero_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_scatter_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_div_no_rounding_mode_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_erf_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_exp_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_expand_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_fftn_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_rfftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_floor_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_floor_divide_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_full_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_gcd_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_ge_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_geometric_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_index_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_index_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_isinf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_lgamma_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_log1p_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_log_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_log_softmax_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nan_to_num_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nansum_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ne_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_new_ones_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardtanh_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mish_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_softshrink_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_fro_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_in_place_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_permute_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_pow_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_prod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_randn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_reciprocal_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_renorm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_rot90_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsqrt_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_sigmoid_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtri_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_split_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_split_list_args_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_squeeze_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_std_mean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_sum_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_t_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_tan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_unsafe_split_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_var_unbiased_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_vdot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_view_copy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_xlogy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_zero__cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_zeros_like_cuda_int32, test/test_decomp.py::DecompOneOffTestsCUDA::test_elu_backward_cuda, test/test_decomp.py::HasDecompTest::test_mm_decompose_mm_dde
2025-12-04T12:50:28.0505065Z 
2025-12-04T12:50:28.0505404Z Finished test_decomp 12/17 ... [2025-12-04 12:50:28.010711][12985.620617205], took 10.35min
2025-12-04T12:50:28.0506454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.xml
2025-12-04T12:50:28.1219671Z Running test_decomp 17/17 ... [2025-12-04 12:50:28.121624][12985.731530802]
2025-12-04T12:50:28.1220377Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:50:28.1223198Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_decomp.py', '--shard-id=17', '--num-shards=17', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:50:28.122058]
2025-12-04T12:59:27.6090513Z 
2025-12-04T12:59:27.6091440Z test_decomp 17/17 was successful, full logs can be found in artifacts with path test/test-reports/test_decomp_17.17_4ba2ec57e0bb6714_.log
2025-12-04T12:59:27.6292837Z Running 535 items in this shard: test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_H_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___getitem___cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___radd___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rand___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmatmul___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rmod___cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive___ror___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rpow___cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive___rxor___cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive__chunk_cat_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addbmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addmm_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_addr_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_alias_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_all_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_angle_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_any_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argsort_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_argwhere_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_as_strided_scatter_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_asinh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atan_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_atleast_2d_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bfloat16_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_left_shift_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_bitwise_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_broadcast_to_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ceil_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cfloat_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_chunk_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_clamp_max_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_column_stack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_combinations_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_conj_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_contiguous_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_copysign_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummax_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cummin_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_cumsum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diag_embed_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_diagflat_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_div_no_rounding_mode_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_double_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_dstack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_einsum_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_like_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_permuted_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_eq_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_erf_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_exp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_as_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expand_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_expm1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftn_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_fftshift_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_hfftn_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ifftshift_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft2_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_ihfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_irfft_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfft2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_fft_rfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_flip_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_float_power_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_frexp_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_full_like_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_gather_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_half_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hash_tensor_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hstack_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_hypot_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_i0_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_add_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_amin_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_reduce_mean_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_index_select_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isfinite_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isneginf_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_isposinf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_2inputs_2outputs_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_jiterator_unary_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ldexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lerp_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_lgamma_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cholesky_ex_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_cross_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvals_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_eigvalsh_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_inv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_ldl_factor_ex_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lstsq_grad_oriented_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_factor_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_lu_solve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_matrix_rank_hermitian_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_pinv_singular_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_slogdet_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorinv_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_tensorsolve_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linalg_vander_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_linspace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log1p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_log_softmax_with_dtype_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logdet_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_not_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logical_or_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_logit_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_long_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mT_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_logsumexp_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_normalize_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_masked_std_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_no_dim_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_max_reduction_with_dim_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_median_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_min_reduction_with_dim_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_minimum_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_msort_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mul_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nanmean_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_narrow_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_empty_strided_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_new_zeros_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nextafter_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool1d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_avg_pool2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_bilinear_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv2d_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_conv3d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_cross_entropy_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_ctc_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_embedding_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_gaussian_nll_loss_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_glu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_hardsigmoid_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_huber_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_bicubic_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_linear_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_interpolate_nearest-exact_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_linear_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_margin_ranking_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_pool1d_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mish_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_mse_loss_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pad_replicate_negative_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pairwise_distance_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pdist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_pixel_unshuffle_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_prelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu6_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_relu_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_rrelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_softmin_with_dtype_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_tanhshrink_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_norm_inf_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ones_like_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pca_lowrank_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_permute_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_polygamma_polygamma_n_2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_positive_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_pow_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_prod_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_qr_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rad2deg_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_randn_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_ravel_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_real_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reciprocal_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_renorm_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_repeat_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_as_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_reshape_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resize__cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_conj_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_resolve_neg_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_roll_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_rsqrt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_add_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_amin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_scatter_reduce_prod_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_select_scatter_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sgn_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sinh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_slice_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_softmax_with_dtype_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_mm_reduce_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sparse_sampled_addmm_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_bessel_y1_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_erfcx_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i0e_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_i1_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_legendre_polynomial_p_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_modified_bessel_i0_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_scaled_modified_bessel_k0_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_shifted_chebyshev_polynomial_v_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_spherical_bessel_j0_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_special_xlog1py_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_list_args_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_split_with_sizes_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_stack_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_std_mean_unbiased_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_sum_to_size_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tan_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_to_sparse_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_torch_ops_aten__safe_softmax_default_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trace_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapezoid_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_trapz_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_tril_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_triu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unbind_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unfold_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_uniform_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unravel_index_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_unsqueeze_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_as_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_copy_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_view_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vsplit_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_vstack_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_where_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_comprehensive_zeros_like_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick__upsample_bilinear2d_aa_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_abs_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_acos_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_acosh_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_add_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_addcdiv_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_amax_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_amin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_aminmax_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_any_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_arange_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_as_strided_copy_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_baddbmm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_bernoulli_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_left_shift_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_bitwise_not_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_bucketize_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_cat_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_ceil_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_conj_physical_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_constant_pad_nd_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward__softmax_backward_data_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_index_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_logaddexp2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_roll_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_sum_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_core_backward_vdot_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_cos_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_cosh_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_cumsum_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_diag_embed_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_copy_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_diagonal_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_digamma_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_dist_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_div_floor_rounding_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_div_trunc_rounding_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_empty_like_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_erfc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_expand_copy_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_expm1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_eye_cuda_float8_e5m2, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft2_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfft_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_hfftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifft_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ifftn_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft2_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfft_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_ihfftn_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfft_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_fft_irfftn_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fill_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_flip_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_fmax_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_fmin_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_fmod_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_grid_sampler_2d_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_gt_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_index_fill_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isin_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_isnan_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_isneginf_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_isposinf_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_item_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_le_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_diagonal_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_linalg_vector_norm_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_linspace_tensor_overload_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_log10_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_log2_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logical_and_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logical_xor_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_logit_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_logspace_tensor_overload_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_logsumexp_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_masked_fill_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mean_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_list_of_tensors_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_meshgrid_variadic_tensors_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_minimum_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_mul_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mv_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_narrow_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_neg_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_new_empty_strided_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_new_full_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_int16, test/test_decomp.py::TestDecompCUDA::test_quick_new_zeros_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_elu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_gelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_hardswish_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool2d_grad_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_max_unpool3d_grad_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_mse_loss_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_pad_constant_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_prelu_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu6_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_relu_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_rrelu_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_nn_functional_silu_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_inf_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_norm_nuc_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float16, test/test_decomp.py::TestDecompCUDA::test_quick_normal_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_ones_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_ones_like_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_permute_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_rad2deg_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_remainder_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_roll_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_round_decimals_3_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_rsub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_select_cuda_complex32, test/test_decomp.py::TestDecompCUDA::test_quick_select_scatter_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sign_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_signbit_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sin_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_sinc_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_sinh_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_slice_scatter_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_softmax_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float32, test/test_decomp.py::TestDecompCUDA::test_quick_special_i1e_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_special_log_ndtr_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_special_ndtr_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_special_xlog1py_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_special_zeta_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_split_with_sizes_cuda_complex64, test/test_decomp.py::TestDecompCUDA::test_quick_stack_cuda_uint8, test/test_decomp.py::TestDecompCUDA::test_quick_sub_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_t_cuda_int64, test/test_decomp.py::TestDecompCUDA::test_quick_take_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trace_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_copy_cuda_float64, test/test_decomp.py::TestDecompCUDA::test_quick_transpose_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_triu_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_trunc_cuda_int32, test/test_decomp.py::TestDecompCUDA::test_quick_unbind_cuda_int8, test/test_decomp.py::TestDecompCUDA::test_quick_unfold_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_bool, test/test_decomp.py::TestDecompCUDA::test_quick_unsqueeze_copy_cuda_complex128, test/test_decomp.py::TestDecompCUDA::test_quick_var_mean_cuda_bfloat16, test/test_decomp.py::TestDecompCUDA::test_quick_view_cuda_bool, test/test_decomp.py::DecompOneOffTestsCUDA::test_native_layer_norm_cpu_decomp_cuda, test/test_decomp.py::DecompOneOffTestsCUDA::test_sdpa_nn_functional_scaled_dot_product_attention_cuda_float32
2025-12-04T12:59:27.6490448Z 
2025-12-04T12:59:27.6490796Z Finished test_decomp 17/17 ... [2025-12-04 12:59:27.609603][13525.219510278], took 8.99min
2025-12-04T12:59:27.6491837Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.xml
2025-12-04T12:59:27.7248121Z Running test_meta 5/5 ... [2025-12-04 12:59:27.724444][13525.334350814]
2025-12-04T12:59:27.7248716Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T12:59:27.7251829Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_meta.py', '--shard-id=5', '--num-shards=5', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 12:59:27.724878]
2025-12-04T13:25:55.7017850Z 
2025-12-04T13:25:55.7018822Z test_meta 5/5 was successful, full logs can be found in artifacts with path test/test-reports/test_meta_5.5_1a0c05f4e7432569_.log
2025-12-04T13:25:56.0416318Z Running 8325 items in this shard: test/test_meta.py::TestMetaConverter::test_channels_last_non_leaf, test/test_meta.py::TestMetaConverter::test_tensor_outlives_converter, test/test_meta.py::TestMetaConverter::test_view_of_leaf, test/test_meta.py::TestMetaCUDA::test_batch_norm_backward_output_mask3_cuda, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype__refs_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_div_floor_rounding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_logical_or_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_binary_ufuncs_mixed_dtype_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_H_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___getitem___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rand___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rmul___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rpow___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace___rxor___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_addcdiv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lerp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_maximum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sinh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_alias_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_allclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_aminmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argsort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_as_strided_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atan2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bernoulli_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bfloat16_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_left_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_block_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_broadcast_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_bucketize_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cfloat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chalf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_char_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_column_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_conj_physical_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_contiguous_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_digamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_no_rounding_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_div_trunc_rounding_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_permuted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_equal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expand_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_irfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_float_power_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_floor_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_fmod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_full_like_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_geometric_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_half_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_hstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_inner_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isfinite_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isposinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_binary_return_by_ref_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_jiterator_unary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_inv_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_factor_ex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lstsq_grad_oriented_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_multi_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_pinv_singular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_linspace_tensor_overload_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log1p_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumprod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_masked_var_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_max_reduction_with_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_median_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_meshgrid_variadic_tensors_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_movedim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_msort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nan_to_num_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nansum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_empty_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_new_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_adaptive_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_conv_transpose3d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_bilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_kl_div_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_margin_ranking_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_mse_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multi_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_reflect_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_pixel_unshuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rms_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softmin_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softplus_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nn_functional_upsample_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_in_place_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_positive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_quantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_randn_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_repeat_interleave_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_searchsorted_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sparse_sampled_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_bessel_y1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_u_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_entr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_erfcx_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_hermite_polynomial_he_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_spherical_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_special_zeta_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_list_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_squeeze_multiple_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_stack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_std_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_svd_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_t_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_along_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_to_sparse_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unflatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_uniform_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unique_cuda_uint32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsafe_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_H_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rpow___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace___rsub___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_floor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_log_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__foreach_zero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmm_decomposed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_alias_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_all_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_allclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_angle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_any_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_argwhere_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_as_strided_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_asinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_baddbmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bfloat16_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bincount_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_and_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_byte_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ceil_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_clone_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_complex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_constant_pad_nd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_contiguous_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_copysign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cummin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagflat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diagonal_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_digamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_floor_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_no_rounding_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_dsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_permuted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eq_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_equal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_erfinv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fft_rfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_flipud_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_float_power_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_floor_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_full_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gcd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_gt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_reduce_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isclose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isfinite_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isnan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isneginf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_isreal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_2inputs_2outputs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_jiterator_unary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kron_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_kthvalue_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lerp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_inv_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_solve_triangular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_svdvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logaddexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_not_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumprod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_masked_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_median_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_msort_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mul_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_narrow_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_native_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ne_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_new_zeros_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_alpha_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_channel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_conv_transpose3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_cross_entropy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_fractional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardswish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mish_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_circular_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pad_replicate_negative_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pairwise_distance_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_poisson_nll_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_rms_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_selu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_soft_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softmin_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nn_functional_upsample_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_nonzero_static_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_norm_nuc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ormqr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_positive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_put_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_qr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randint_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_real_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reciprocal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_repeat_interleave_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_reshape_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resize_as__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_conj_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_roll_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_rsub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scalar_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sgn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_short_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sparse_mm_reduce_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_airy_ai_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_bessel_y1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_hermite_polynomial_he_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_i1e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_i1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_ndtri_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_split_with_sizes_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_stack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_std_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_topk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_triu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_true_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unbind_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unique_cuda_uint16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_chunk_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_unsqueeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_var_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_view_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_vstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_H_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_T_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___getitem___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___ror___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace___rsub___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcdiv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_erfc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_expm1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_floor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log1p_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_log_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_mul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_rsqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__softmax_backward_data_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_acosh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_atan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_maximum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__foreach_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides__segment_reduce_lengths_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addmv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_atan2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_block_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_dist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_equal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_ihfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_flatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_floor_divide_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ge_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_gt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_item_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_jiterator_2inputs_2outputs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_diagonal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_ldl_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linalg_tensorinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_linspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_movedim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_narrow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_new_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_avg_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_fractional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_mse_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_ones_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_bartlett_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i0e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_special_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_split_with_sizes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_unravel_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_view_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_all_strides_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_aminmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_angle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_arange_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_as_strided_scatter_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_2d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_baddbmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bernoulli_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_left_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_not_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_bitwise_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_block_diag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cartesian_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cauchy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cdouble_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chalf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_inverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cholesky_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_clone_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_combinations_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_conj_physical_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_constant_pad_nd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_copysign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cos_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cummax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_cumulative_trapezoid_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_deg2rad_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diagonal_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_diff_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_digamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_double_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_dstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_permuted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eq_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_hfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ifftshift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_irfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fft_rfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_float_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_floor_divide_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_fmod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gather_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geometric_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_gradient_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hash_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_hstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_imag_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_mean_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_index_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isclose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_istft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_2inputs_2outputs_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kron_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_kthvalue_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cholesky_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cond_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eig_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_eigvalsh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_normal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_and_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_or_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_long_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mH_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumprod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_fill_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_std_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_masked_var_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_maximum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_median_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_list_of_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_min_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_movedim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nansum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_ones_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_alpha_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose2d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_conv_transpose3d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_cosine_embedding_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_dropout3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_elu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_bag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_fractional_max_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gaussian_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_group_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_hardtanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_area_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_bicubic_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_linear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_l1_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_leaky_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_margin_ranking_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_max_unpool3d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_mish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multi_head_attention_forward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_reflect_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pad_replicate_negative_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pairwise_distance_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_scaled_dot_product_attention_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_soft_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softmin_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_softplus_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_tanhshrink_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_threshold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nn_functional_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_nonzero_static_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_norm_inf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ones_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ormqr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pca_lowrank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_positive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_pow_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rand_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randint_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_ravel_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_real_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reciprocal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_renorm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_repeat_interleave_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_reshape_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_round_decimals_neg_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scalar_tensor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_searchsorted_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_gaussian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signal_windows_general_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_signbit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_j1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_h_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_log_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_xlog1py_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_special_zeta_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_list_args_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_std_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_svd_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_t_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tensor_split_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tile_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_tril_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_triu_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_true_divide_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_trunc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unflatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_uniform_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_consecutive_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unique_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_unsqueeze_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_var_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_xlogy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_T_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___getitem___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___radd___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmatmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rsub___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace___rxor___cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__chunk_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_add_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_div_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_exp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_floor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_frac_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lerp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_lgamma_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_log_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_max_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_maximum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_mul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_neg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_norm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_pow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_reciprocal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_round_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sign_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sqrt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_sub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_tanh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_trunc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__segment_reduce_offsets_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__softmax_backward_data_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addcmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_alias_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_T_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rand___cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rmod___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides___rmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_abs_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_clamp_min_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_cosh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__foreach_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__native_batch_norm_legit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides__upsample_bilinear2d_aa_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_addr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_argwhere_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bernoulli_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_bitwise_right_shift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_broadcast_shapes_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cdouble_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_diag_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_eq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_flip_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_gradient_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_index_reduce_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isfinite_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isinf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_isneginf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_le_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_eigh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_lstsq_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_matrix_rank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_pinv_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_linalg_tensorsolve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_logical_not_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_lu_unpack_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_masked_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_matrix_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_max_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_min_reduction_with_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nanmean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_narrow_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_empty_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_new_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_dropout3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_embedding_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_interpolate_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_logsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_margin_ranking_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_max_unpool2d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_relu6_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_softsign_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_threshold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_nn_functional_upsample_nearest_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_norm_nuc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_outer_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_resize_as__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_rot90_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_round_decimals_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scalar_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_blackman_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_signal_windows_hann_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_slice_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_airy_ai_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_bessel_y0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_chebyshev_polynomial_w_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_modified_bessel_k1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_std_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_take_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tensor_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_unsqueeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_view_as_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_where_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zero__cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_all_strides_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_aminmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_angle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_arange_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argsort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_partial_views_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_1d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_baddbmm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_right_shift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_block_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_broadcast_to_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_byte_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ceil_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cfloat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chalf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_inverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cholesky_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_clone_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_column_stack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_combinations_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_complex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_conj_physical_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_constant_pad_nd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_contiguous_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_copysign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_corrcoef_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cos_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_count_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cov_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cummin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumprod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_cumulative_trapezoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_diff_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_digamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_floor_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_div_trunc_rounding_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_einsum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_permuted_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eq_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_exponential_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e4m3fnuz, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_ihfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flatten_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fliplr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_floor_divide_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_fmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_full_like_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gather_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_geometric_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gradient_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_grid_sampler_2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_gt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hash_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_heaviside_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_i0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_imag_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_reduce_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_index_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isfinite_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isnan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isposinf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_isreal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_item_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_binary_return_by_ref_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_jiterator_unary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kron_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_kthvalue_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ldexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cholesky_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_det_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_diagonal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eig_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_inv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_factor_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_solve_triangular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vander_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_linspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logaddexp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logcumsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_not_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logspace_tensor_overload_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_logsumexp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_long_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_lu_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mH_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mT_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_amin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_argmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_softmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_matrix_exp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_pool2d_with_indices_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_max_reduction_no_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_median_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_list_of_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_meshgrid_variadic_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_no_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mode_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_movedim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_msort_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nan_to_num_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nanmedian_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nansum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_narrow_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_batch_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_native_dropout_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_empty_strided_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_full_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_new_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nextafter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_channel_shuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_conv_transpose1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_dropout_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_elu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_bag_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_gelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_group_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardswish_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_hardtanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_instance_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_nearest_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_l1_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_leaky_relu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_grad_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_mse_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multi_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_multilabel_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_circular_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_constant_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_poisson_nll_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_prelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_rrelu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_selu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_complex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_silu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_smooth_l1_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softmin_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_softsign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_threshold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nn_functional_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_norm_nuc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_normal_number_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ones_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_outer_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_permute_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_polygamma_polygamma_n_4_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randint_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_randn_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_ravel_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_reciprocal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_remainder_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize__cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resize_as__cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_conj_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rot90_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_round_decimals_neg_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_rsub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scalar_tensor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_prod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_scatter_reduce_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_searchsorted_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_select_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sgn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_bartlett_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_blackman_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_cosine_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signal_windows_nuttall_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_signbit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_slice_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sparse_mm_reduce_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_j1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_entr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_erfcx_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_laguerre_polynomial_l_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_legendre_polynomial_p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_log_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_i1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_modified_bessel_k0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_ndtri_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_scaled_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_shifted_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_spherical_bessel_j0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_xlog1py_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_split_with_sizes_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_square_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_squeeze_multiple_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_std_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_sum_to_size_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_along_dim_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensor_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tensordot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_to_sparse_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_topk_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trace_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapezoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_trapz_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triangular_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_tril_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_triu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unflatten_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_bool, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_consecutive_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unique_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unravel_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_chunk_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsafe_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_var_unbiased_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_cuda_float64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_as_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_view_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_where_cuda_int16, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_xlogy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_empty_quantized_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask0_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask4_cuda, test/test_meta.py::TestMetaCUDA::test_layer_norm_backward_output_mask5_cuda, test/test_meta.py::TestMetaCUDA::test_meta__fused_moving_avg_obs_fq_helper_cuda, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_H_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_T_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rdiv___cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmatmul___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmod___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rmul___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace___rpow___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__chunk_cat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_acos_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_add_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcdiv_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_atan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_ceil_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_erf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_exp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_frac_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_lgamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log10_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log1p_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_log_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_max_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_norm_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sigmoid_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sinh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_tanh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_trunc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__foreach_zero_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_lengths_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__segment_reduce_offsets_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__softmax_backward_data_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace__unsafe_masked_index_put_accumulate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace__upsample_bilinear2d_aa_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_abs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_acosh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addbmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcdiv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addmm_decomposed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_addr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_alias_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_all_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_aminmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_angle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_any_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_argwhere_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_as_strided_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_asinh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_1d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_2d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_atleast_3d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_baddbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bernoulli_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bfloat16_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_or_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_right_shift_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bmm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bool_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_tensors_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_broadcast_to_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_bucketize_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_byte_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cartesian_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cat_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cdouble_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ceil_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cfloat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chalf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_char_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cholesky_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_chunk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_column_stack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_combinations_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_conj_physical_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_constant_pad_nd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_copysign_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_corrcoef_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_count_nonzero_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cov_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cross_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cummin_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_cumsum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_deg2rad_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diag_embed_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diagonal_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_diff_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dist_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_no_rounding_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_div_trunc_rounding_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_double_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_dstack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_permuted_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_empty_strided_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_equal_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_as_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expand_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_eye_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_fftshift_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_hfftn_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ifftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_ihfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_irfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fft_rfft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flip_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_flipud_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_float_power_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_floor_divide_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_fmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_full_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ge_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_geqrf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_gt_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_half_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hash_tensor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_heaviside_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_histc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_hypot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_i0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_igammac_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_fill_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amax_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_amin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_mean_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_reduce_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_index_select_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_inner_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_int_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isfinite_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isinf_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isnan_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isneginf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isposinf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_isreal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_istft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_4inputs_with_extra_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_binary_return_by_ref_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_jiterator_unary_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kron_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lcm_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ldexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_le_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lerp_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lgamma_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cholesky_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_det_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_diagonal_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eig_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_eigvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_ldl_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_lu_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_matrix_rank_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_multi_dot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_norm_subgradients_at_zero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_hermitian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_pinv_singular_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_svdvals_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vander_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linalg_vector_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_linspace_tensor_overload_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log10_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_log_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_not_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_or_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_logsumexp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_long_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_lt_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mH_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_amin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_argmin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumprod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_cumsum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_fill_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_normalize_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_std_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_masked_var_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matmul_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_matrix_exp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_no_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_max_reduction_with_dim_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_meshgrid_variadic_tensors_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_no_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_minimum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mode_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_movedim_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_mvlgamma_mvlgamma_p_5_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nan_to_num_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanmedian_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nanquantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_narrow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_neg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_empty_strided_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_ones_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_new_zeros_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_avg_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_avg_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_celu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_channel_shuffle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_conv_transpose1d_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_cosine_similarity_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_ctc_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_dropout_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_elu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_with_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_feature_alpha_dropout_without_train_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool2d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_grid_sample_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hardtanh_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_hinge_embedding_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_huber_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_instance_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_bicubic_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_interpolate_trilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_layer_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_logsigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_margin_ranking_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool1d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mish_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_mse_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_circular_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_constant_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pairwise_distance_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_shuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_pixel_unshuffle_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_poisson_nll_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu6_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_relu_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_rms_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_complex_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_silu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_soft_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softmin_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_softsign_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_tanhshrink_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nn_functional_unfold_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_nonzero_static_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_fro_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_norm_inf_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_normal_in_place_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pca_lowrank_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_permute_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polar_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_1_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_positive_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_pow_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_put_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rad2deg_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rand_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randint_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_randn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_real_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reciprocal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_remainder_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_renorm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_repeat_interleave_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_reshape_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_conj_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_resolve_neg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rot90_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_round_decimals_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_rsqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scalar_tensor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_searchsorted_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_select_scatter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sgn_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_short_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sigmoid_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signal_windows_general_cosine_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_signbit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_slice_scatter_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_softmax_with_dtype_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_j1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_bessel_y0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_t_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_v_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_entr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_erfcx_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_hermite_polynomial_h_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_log_ndtr_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_modified_bessel_k0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_polygamma_special_polygamma_n_0_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_scaled_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_shifted_chebyshev_polynomial_w_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_xlog1py_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_special_zeta_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sqrt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_std_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sub_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_sum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_along_dim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_take_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensor_split_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tensordot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tile_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_topk_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trace_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_transpose_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapezoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_tril_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_true_divide_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_trunc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unbind_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unflatten_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unfold_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_uniform_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_chunk_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsafe_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_unsqueeze_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_var_mean_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_as_real_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_vstack_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_where_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_xlogy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_inplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_H_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___getitem___cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___radd___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rand___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rdiv___cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmatmul___cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rmul___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace___ror___cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rpow___cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace___rsub___cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__batch_norm_with_update_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__chunk_cat_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_abs_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_acos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_add_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcdiv_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_addcmul_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_asin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_atan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_ceil_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_max_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_clamp_min_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_cosh_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_div_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_erfc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_exp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_expm1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_floor_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_frac_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lerp_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log10_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_max_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_maximum_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_minimum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_round_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_rsqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sign_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sin_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sinh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sqrt_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_sub_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_trunc_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__foreach_zero_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__native_batch_norm_legit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace__softmax_backward_data_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace__unsafe_masked_index_put_accumulate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace__upsample_bilinear2d_aa_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_abs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_acosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_add_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addbmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcdiv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addcmul_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addmv_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_addr_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_alias_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_all_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_allclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_amin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_aminmax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_angle_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_any_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_arange_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argmin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argsort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_argwhere_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_as_strided_partial_views_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_asinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_1d_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_2d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_atleast_3d_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bfloat16_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bincount_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_and_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_left_shift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bitwise_xor_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_block_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bool_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_broadcast_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_bucketize_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_byte_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cartesian_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cdouble_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ceil_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cfloat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_char_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_inverse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cholesky_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_max_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clamp_min_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_clone_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_column_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_combinations_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_conj_physical_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_constant_pad_nd_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_contiguous_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_corrcoef_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cos_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cosh_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_count_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cov_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cross_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummax_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cummin_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumprod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumsum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_cumulative_trapezoid_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_deg2rad_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diag_embed_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagflat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diagonal_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_diff_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_digamma_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_floor_rounding_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_no_rounding_mode_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_div_trunc_rounding_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_double_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_dstack_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_einsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_like_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_empty_strided_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_erfinv_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exp2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_as_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expand_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_expm1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_float8_e5m2, test/test_meta.py::TestMetaCUDA::test_meta_outplace_eye_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_fftshift_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfft_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_hfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft2_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifft_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ifftshift_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft2_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_ihfft_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft2_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfft_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_irfftn_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft2_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfft_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fft_rfftn_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fill_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flatten_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flip_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fliplr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_flipud_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_float_power_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmax_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_fmod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_frexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_full_like_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gather_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gcd_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ge_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gradient_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_grid_sampler_3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_gt_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_half_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hash_tensor_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_heaviside_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_histc_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_hypot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_i0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_igammac_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_imag_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_add_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_put_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amax_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_amin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_reduce_prod_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_index_select_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_int_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isclose_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isfinite_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isinf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isnan_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isneginf_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_isreal_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_item_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_2inputs_2outputs_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_4inputs_with_extra_args_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_jiterator_binary_return_by_ref_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_kthvalue_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lcm_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ldexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_le_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lerp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lgamma_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cholesky_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cond_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_cross_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_det_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_diagonal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eig_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvals_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_eigvalsh_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_householder_product_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_inv_ex_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_ldl_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lstsq_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_lu_factor_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_matrix_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_norm_subgradients_at_zero_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_slogdet_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_ex_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_solve_triangular_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_svd_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_tensorinv_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vander_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vecdot_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linalg_vector_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_linspace_tensor_overload_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log10_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log1p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log2_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_normal_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_log_softmax_with_dtype_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp2_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logaddexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logdet_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_and_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_not_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_or_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logical_xor_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logspace_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_logsumexp_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_long_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_solve_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_lu_unpack_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mH_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mT_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_amin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_argmin_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumprod_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_cumsum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_fill_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_log_softmax_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logaddexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_logsumexp_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_prod_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_scatter_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_select_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_std_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_sum_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_masked_var_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_matmul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_binary_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_max_pool2d_with_indices_backward_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_maximum_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_median_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_list_of_tensors_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_meshgrid_variadic_tensors_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_binary_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_no_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_min_reduction_with_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_minimum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mode_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_movedim_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_msort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mul_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_multinomial_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_3_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nan_to_num_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanmean_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nanquantile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nansum_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_narrow_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_batch_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_dropout_backward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_native_layer_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ne_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_empty_strided_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_full_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_ones_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_new_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nextafter_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_avg_pool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_adaptive_max_pool3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_avg_pool1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_binary_cross_entropy_with_logits_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_celu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_channel_shuffle_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose1d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose2d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_conv_transpose3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cosine_embedding_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_cross_entropy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_dropout_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_embedding_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_with_train_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_fractional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gaussian_nll_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_gelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_glu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_grid_sample_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardshrink_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardsigmoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardswish_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hardtanh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_hinge_embedding_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_huber_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_area_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_bilinear_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_nearest_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_interpolate_trilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_kl_div_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_l1_loss_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_layer_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_leaky_relu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_linear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_local_response_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_logsigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_margin_ranking_loss_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool2d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_pool3d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool1d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool2d_grad_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_max_unpool3d_grad_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_head_attention_forward_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multi_margin_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_multilabel_soft_margin_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_normalize_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_circular_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_constant_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_reflect_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pad_replicate_negative_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pairwise_distance_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pdist_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_shuffle_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_pixel_unshuffle_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_poisson_nll_loss_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_prelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu6_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_relu_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rms_norm_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_rrelu_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_scaled_dot_product_attention_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_selu_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_smooth_l1_loss_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softmin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_softplus_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_tanhshrink_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_threshold_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nn_functional_upsample_bilinear_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_nonzero_static_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_fro_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_norm_inf_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_normal_number_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ones_like_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ormqr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_outer_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pca_lowrank_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_permute_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pinverse_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_0_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_3_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_polygamma_polygamma_n_4_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_positive_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_pow_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_prod_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_put_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_qr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_quantile_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rad2deg_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rand_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randint_like_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_randn_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_ravel_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_real_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reciprocal_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_remainder_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_repeat_interleave_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_as_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_reshape_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize__cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resize_as__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_conj_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_resolve_neg_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_roll_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rot90_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_round_decimals_3_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsqrt_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_rsub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scalar_tensor_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_add_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amax_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_amin_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_mean_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_scatter_reduce_sum_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_select_scatter_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sgn_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_short_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sigmoid_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sign_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_exponential_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_gaussian_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_general_hamming_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_hamming_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signal_windows_nuttall_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_signbit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sin_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinc_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sinh_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_slice_scatter_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_softmax_with_dtype_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sort_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_mm_reduce_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sparse_sampled_addmm_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_airy_ai_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_j1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y0_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_bessel_y1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_t_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_u_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_v_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_chebyshev_polynomial_w_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_entr_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_erfcx_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_hermite_polynomial_he_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i0e_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_i1e_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_laguerre_polynomial_l_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_legendre_polynomial_p_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_log_ndtr_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_i1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_modified_bessel_k1_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtr_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_ndtri_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k0_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_scaled_modified_bessel_k1_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_spherical_bessel_j0_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_xlog1py_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_special_zeta_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_list_args_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_split_with_sizes_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sqrt_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_square_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_squeeze_multiple_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stack_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_mean_unbiased_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_std_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_stft_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sub_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_sum_to_size_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_svd_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_t_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_along_dim_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_take_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tan_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tanh_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tensor_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tile_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_to_sparse_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_topk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch__scaled_mm_cuda_float8_e4m3fn, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_torch_ops_aten__safe_softmax_default_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trace_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_transpose_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapezoid_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_trapz_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triangular_solve_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_tril_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_triu_indices_cuda_int64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_true_divide_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_copy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unbind_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unflatten_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_copy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unfold_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_uniform_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unique_consecutive_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_chunk_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsafe_split_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_copy_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_unsqueeze_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_mean_unbiased_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_var_unbiased_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vdot_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_as_complex_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_copy_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_bool, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_view_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vsplit_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_vstack_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_where_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_xlogy_cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_bfloat16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_float64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zero__cuda_uint8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_cuda_int8, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex128, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_complex64, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_float16, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int32, test/test_meta.py::TestMetaCUDA::test_meta_outplace_zeros_like_cuda_int64, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float16_float16_cuda, test/test_meta.py::TestMetaCUDA::test_mixed_dtype_for_native_layer_norm_backward_float32_float32_cuda, test/test_meta.py::TestMetaCUDA::test_nan_to_num_cuda, test/test_meta.py::TestMetaCUDA::test_nonzero_cuda
2025-12-04T13:25:56.3781506Z 
2025-12-04T13:25:56.3781809Z Finished test_meta 5/5 ... [2025-12-04 13:25:55.712663][15113.322566255], took 26.47min
2025-12-04T13:25:56.3782840Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.xml
2025-12-04T13:25:57.1767026Z Uploading artifacts took 1.17 seconds
2025-12-04T13:25:57.1771111Z Running test_nestedtensor 1/4 ... [2025-12-04 13:25:57.176936][15114.786843181]
2025-12-04T13:25:57.1771730Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:25:57.1776382Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:25:57.177390]
2025-12-04T13:33:42.7729421Z 
2025-12-04T13:33:42.7733145Z test_nestedtensor 1/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_1.4_6dff2e85dc80cacf_.log
2025-12-04T13:33:42.7968742Z Running 408 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_cat, test/test_nestedtensor.py::TestNestedTensor::test_copy_, test/test_nestedtensor.py::TestNestedTensor::test_jagged_with_dim_error, test/test_nestedtensor.py::TestNestedTensor::test_like_functions_randn_like, test/test_nestedtensor.py::TestNestedTensor::test_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_size_dim, test/test_nestedtensor.py::TestNestedTensor::test_unbind_4, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_eq_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_binary_ops_with_scalar_ge_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_detach_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_empty_like_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_is_all_true_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_is_any_true_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amin_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_breaking_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_with_bmm_path_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_transpose_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_chunk_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_384_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_8_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_reshape_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_share_memory_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_gelu_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_logical_not_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_neg_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_add_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_for_sub_op_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_dropout_backward_jagged_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_1023_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_mask_and_to_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_from_padded_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_linear_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_squeeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_to_padded_tensor_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_unbind_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_broadcasting_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_chunk_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_compile_padded_dense_conversion_preserves_metadata_cache_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_flatten_decomp_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_index_put_error_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_contiguous_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_padded_dense_conversion_kernels_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_2d_input_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_operate_on_batch_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_with_lengths_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_full_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_linear_nt_dim_5_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_transposed_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_contig_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_with_holes_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_specialize_dynamic_shape_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_split_with_sizes_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_3_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_last_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chalf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_frexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_half_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_erfcx_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_unflatten_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rpow___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bfloat16_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_div_no_rounding_mode_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_exp2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_fmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_matmul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_prelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sinc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_byte_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_char_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_cosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_deg2rad_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_div_trunc_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_frac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_binary_return_by_ref_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logaddexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_not_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_or_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_xor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nansum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_narrow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nextafter_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_logsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_airy_ai_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_hermite_polynomial_h_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_i0e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_log_ndtr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_xlog1py_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_zeta_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmul___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bmm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ceil_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_max_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_cos_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_div_floor_rounding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_float_power_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_hypot_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_igammac_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_index_put_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_minimum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_embedding_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_hardsigmoid_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_y1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_scaled_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_spherical_bessel_j0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_split_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_to_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_unflatten_cuda_float32
2025-12-04T13:33:42.8203627Z 
2025-12-04T13:33:42.8203974Z Finished test_nestedtensor 1/4 ... [2025-12-04 13:33:42.773579][15580.383484573], took 7.76min
2025-12-04T13:33:42.8205185Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.xml
2025-12-04T13:33:42.8949140Z Running test_nestedtensor 4/4 ... [2025-12-04 13:33:42.894273][15580.504178771]
2025-12-04T13:33:42.8949847Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:33:42.8951755Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_nestedtensor.py', '--shard-id=4', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:33:42.894712]
2025-12-04T13:44:53.6410368Z 
2025-12-04T13:44:53.6411718Z test_nestedtensor 4/4 was successful, full logs can be found in artifacts with path test/test-reports/test_nestedtensor_4.4_fadd9c2633e00561_.log
2025-12-04T13:44:53.6648278Z Running 415 items in this shard: test/test_nestedtensor.py::TestNestedTensor::test_2d_nested_tensor_batch_size_2_max_seq_len_3_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_batch_size_4_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_10, test/test_nestedtensor.py::TestNestedTensor::test_3d_nested_tensor_float_batch_size_2_max_seq_len_5_vocab_size_20, test/test_nestedtensor.py::TestNestedTensor::test_default_nested_tensor, test/test_nestedtensor.py::TestNestedTensor::test_nested_namespace, test/test_nestedtensor.py::TestNestedTensor::test_numel, test/test_nestedtensor.py::TestNestedTensor::test_size, test/test_nestedtensor.py::TestNestedTensor::test_stride, test/test_nestedtensor.py::TestNestedTensor::test_unbind_dim, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cpu_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_bmm_cuda_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_contiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_device_checks_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_noncontiguous_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_dropout_strided_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_amax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmax_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_int32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_argmin_dtypes_cuda_uint8, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_bfloat16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_max_dtypes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_jagged_min_dtypes_cuda_int64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_layer_norm_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_masked_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_matmul_nt_with_broadcasted_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_add_in_place_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_128_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_dense_elementwise_embedding_dim_256_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_div_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_indexing_noncontiguous_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_mul_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_split_with_sizes_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_nested_tensor_sub_transpose_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_scaled_dot_product_attention_input_dim_3_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_False_weights_only_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_serialization_requires_grad_True_weights_only_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_softmax_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_squeeze_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim3_cuda_float64, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_dim4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_output_size_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_padded_tensor_zero_numel_errors_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_to_then_from_padded_tensor_no_transform0213_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_transpose_inference_mode_interaction_cuda_float16, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_abs_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_isnan_cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_relu__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unary_funcs_tanh__cuda, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_unbind_noncontiguous_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float32, test/test_nestedtensor.py::TestNestedTensorDeviceTypeCUDA::test_view_cuda_float64, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_accumulate_grad_different_strides_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_as_nested_tensor_propagates_gradients_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_backward_add_strided_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_gelu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_5d_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_128_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_4_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_layer_norm_backward_size_512_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_masked_fill_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_bmm_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_matmul_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_reshape_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_softmax_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_transpose_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_nested_tensor_unsqueeze_gradcheck_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_selu_backward_cuda, test/test_nestedtensor.py::TestNestedTensorAutogradCUDA::test_split_with_sizes_flow_through_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_apply__cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_0_layout_strided_requires_grad_True_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_1_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_2_layout_strided_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_3_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_jagged_requires_grad_True_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_False_contiguous_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_as_nested_tensor_from_tensor_dim_4_layout_strided_requires_grad_True_contiguous_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_binary_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_device_dtype_transfer_updates_offsets_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dropout_inference_mode_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_dummy_mha_with_nt_use_legacy_api_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_is_same_size_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_as_nested_tensor_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_layout_construction_nested_tensor_requires_grad_False_components_require_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_op_different_output_shape_dim_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_False_values_is_view_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_jagged_view_from_values_offsets_requires_grad_True_values_is_view_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_layer_norm_reduce_ragged_idx_1_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_shape_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_empty_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_ones_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randint_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_randn_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_like_value_zeros_like_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_nested_tensor_activation_checkpoint_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_pointwise_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_noncontiguous_to_noncontig_with_holes_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_mean_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_batch_only_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_1_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_mean_transpose_offset_2_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_1_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_reduce_ragged_idx_greater_than_1_different_output_shape_sum_transpose_offset_2_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_False_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_transpose_non_ragged_dim_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_False_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_mean_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_op_dim_with_lengths_different_output_shape_sum_keepdim_True_requires_grad_True_components_require_grad_True_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_permute_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_reshape_decomp_requires_grad_False_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_backwards_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_compile_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sdpa_with_constant_sequence_length_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_serialization_noncontig_transposed_weights_only_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_reduce_ragged_idx_greater_than_1_same_output_shape_transpose_offset_2_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_False_log_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_False_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_requires_grad_True_components_require_grad_True_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_transpose_non_ragged_dim_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_dim_with_lengths_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_softmax_reduce_batch_dim_requires_grad_True_components_require_grad_False_softmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_batch_and_non_batch_keepdim_True_requires_grad_True_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_False_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_sum_dim_reduce_ragged_and_non_batch_keepdim_True_requires_grad_False_components_require_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_dtype_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_2_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_compile_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_2_requires_grad_False_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_False_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float16, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_3_requires_grad_True_cuda_float64, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_to_padded_tensor_nt_dim_4_requires_grad_True_cuda_bool, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_backward_cuda_float32, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_lengths_ragged_idx_0_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unbind_transpose_ragged_idx_2_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_unsafe_view_cuda, test/test_nestedtensor.py::TestNestedTensorSubclassCUDA::test_views_inherit_ragged_dim_cuda, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_positive_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_special_i1e_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_squeeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___radd___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward___rdiv___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cdouble_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_cfloat_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_chunk_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_erfc_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_linalg_vector_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_logit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_min_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_hardtanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_nn_functional_softsign_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_polygamma_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rad2deg_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_round_decimals_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_rsub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sgn_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_special_entr_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sqrt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sub_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_unsqueeze_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_backward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward___rsub___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_abs_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_add_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_argmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_atan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_clone_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_complex_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_count_nonzero_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_eq_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_gt_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isfinite_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_lgamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log1p_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_logical_and_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_logsumexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_select_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_std_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_masked_var_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_maximum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_ne_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_hardshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_relu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_rms_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_nn_functional_softshrink_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_polygamma_polygamma_n_4_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_pow_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_round_decimals_neg_3_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_ndtri_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_square_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_std_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_tanh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_compile_forward_where_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward___rmod___cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_acosh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_all_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amax_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_amin_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_angle_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_any_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_asinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_atan2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_bool_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_clamp_min_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_conj_physical_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_digamma_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_double_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_erfinv_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_exp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_expm1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fill_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_floor_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_fmod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ge_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_i0_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isnan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isneginf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_isposinf_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_binary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_jiterator_unary_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_ldexp_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_le_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log10_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_log_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_norm_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_masked_sum_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_max_reduction_with_dim_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mean_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mul_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nan_to_num_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_celu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_elu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_linear_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_relu6_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_rrelu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_selu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_silu_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_nn_functional_softplus_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_polygamma_polygamma_n_2_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_prod_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_real_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_reciprocal_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_remainder_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_round_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_short_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_signbit_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_sinh_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_bessel_j1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_i1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_modified_bessel_k1_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_tan_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_true_divide_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_var_unbiased_cuda_float32, test/test_nestedtensor.py::TestNestedTensorOpInfoCUDA::test_forward_where_cuda_float32
2025-12-04T13:44:53.6881603Z 
2025-12-04T13:44:53.6881999Z Finished test_nestedtensor 4/4 ... [2025-12-04 13:44:53.641576][16251.251482241], took 11.18min
2025-12-04T13:44:53.6883171Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.xml
2025-12-04T13:44:53.8142483Z Running test_ops 5/11 ... [2025-12-04 13:44:53.813970][16251.423877342]
2025-12-04T13:44:53.8142979Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T13:44:53.8146570Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=5', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 13:44:53.814438]
2025-12-04T14:05:21.9208174Z 
2025-12-04T14:05:21.9209043Z test_ops 5/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_5.11_352ce2577683b96d_.log
2025-12-04T14:05:22.0437740Z Running 3037 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_normal__in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing__chunk_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_permuted_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_static_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__chunk_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_errors_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cov_cuda, test/test_ops.py::TestCommonCUDA::test_errors_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout0_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_alias_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_channel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_torch_ops_aten__safe_softmax_default_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_permuted_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_linalg_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsafe_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_static_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_multiple_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unbind_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unravel_index_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hash_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_channel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rms_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch__scaled_mm_v2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__flash_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float___rdiv___cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_mvlgamma_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_he_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_block_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_channel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_smooth_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_normal_number_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_geometric_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_item_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logaddexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_normal_number_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rad2deg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_deg2rad_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_geometric_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pixel_unshuffle_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_count_nonzero_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumprod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_geometric_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_item_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rad2deg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_multiple_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__chunk_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cauchy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_permuted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hash_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsafe_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_vsplit_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_normal__in_place_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_permuted_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expm1_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_logsumexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__chunk_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_normal__in_place_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__unsafe_masked_index_put_accumulate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_multiple_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view__batch_norm_with_update_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_select_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_permuted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_static_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_t_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_tensor_overload_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_complex32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_zeros_cuda_int8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_float64, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_polar_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_take_along_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unravel_index_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_unsafe_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64
2025-12-04T14:05:22.1633188Z 
2025-12-04T14:05:22.1633508Z Finished test_ops 5/11 ... [2025-12-04 14:05:21.924683][17479.53458861], took 20.47min
2025-12-04T14:05:22.1634534Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.xml
2025-12-04T14:05:23.5040039Z Uploading artifacts took 1.37 seconds
2025-12-04T14:05:23.5044416Z Running test_ops 10/11 ... [2025-12-04 14:05:23.504261][17481.114167405]
2025-12-04T14:05:23.5044895Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:05:23.5049221Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_ops.py', '--shard-id=10', '--num-shards=11', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:05:23.504717]
2025-12-04T14:26:50.4287821Z 
2025-12-04T14:26:50.4288726Z test_ops 10/11 was successful, full logs can be found in artifacts with path test/test-reports/test_ops_10.11_9feb13593ea58df6_.log
2025-12-04T14:26:50.5496830Z Running 2991 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_in_place_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsafe_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nanmean_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cauchy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_tensor_overload_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diff_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mul_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ne_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_randn_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_sum_layout1_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sparse_zeros_like_layout4_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch__chunk_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_meta_consistency_out_dtype_mismatch_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__unsafe_masked_index_put_accumulate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_static_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_item_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cauchy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_permute_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expand_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_unsqueeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_requires_grad_error_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_normal__in_place_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_multiple_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__unsafe_masked_index_put_accumulate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_item_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_torch_ops_aten__efficient_attention_forward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_ldexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_masked_std_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_polygamma_polygamma_n_4_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_rad2deg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_v_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_legendre_polynomial_p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_u_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_promotes_int_to_float_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_alias_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_count_nonzero_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumprod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_deg2rad_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float8_e4m3fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_geometric_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_item_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logaddexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pixel_unshuffle_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_renorm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_multiple_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_take_along_dim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_alias_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_block_diag_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_count_nonzero_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumprod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_equal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e5m2, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float8_e5m2fnuz, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_istft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_cross_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vecdot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_tensor_overload_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_normal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_channel_shuffle_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_shuffle_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pixel_unshuffle_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_smooth_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_select_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_split_with_sizes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_multiple_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_take_along_dim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_alias_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumprod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_equal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exponential_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_item_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_tensor_overload_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_channel_shuffle_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_normal_number_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_renorm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_select_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_take_along_dim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_alias_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_block_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cauchy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_cross_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_tensor_overload_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logaddexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_tensor_overload_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_channel_shuffle_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_split_with_sizes_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_take_along_dim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_reduction_ops_reduce_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_alias_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_permuted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geometric_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hash_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsafe_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__batch_norm_with_update_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_alias_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_item_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multi_head_attention_forward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_cow_input_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__chunk_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__unsafe_masked_index_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_multiple_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unbind_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_view_replay_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_tensor_overload_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nanmean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expm1_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_item_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_negative_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_tensor_overload_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_lengths_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__unsafe_masked_index_put_accumulate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_item_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest-exact_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_in_place_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_mm_reduce_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_negative_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rms_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_permute_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unravel_index_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__chunk_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__unsafe_masked_index_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__upsample_bilinear2d_aa_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_alias_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cauchy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_item_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_in_place_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsafe_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_linspace_tensor_overload_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_logspace_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout__refs_ones_cuda_uint8, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_bfloat16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_arange_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_float16, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_linspace_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_ones_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_int32, test/test_ops.py::TestFakeTensorCUDA::test_strided_layout_zeros_cuda_uint8, test/test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_item_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_tensor_overload_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_renorm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__segment_reduce_lengths_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_in_place_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32, test/test_ops.py::TestForwardADWithScalarsCUDA::test_0d_tensor_with_python_scalar_div_no_rounding_mode_cuda_float32
2025-12-04T14:26:50.6673467Z 
2025-12-04T14:26:50.6673781Z Finished test_ops 10/11 ... [2025-12-04 14:26:50.432867][18768.04277075], took 21.45min
2025-12-04T14:26:50.6674795Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.xml
2025-12-04T14:26:51.8004059Z Uploading artifacts took 1.18 seconds
2025-12-04T14:26:51.8007955Z Running functorch/test_ops 2/7 ... [2025-12-04 14:26:51.800613][18769.410519202]
2025-12-04T14:26:51.8008488Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:26:51.8013194Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=2', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:26:51.801061]
2025-12-04T14:38:56.6341957Z 
2025-12-04T14:38:56.6342940Z functorch/test_ops 2/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_2.7_066e83f50e6dcbea_.log
2025-12-04T14:38:56.7026543Z Running 1440 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_extremal_numerics_cross_entropy_cuda, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_eigvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_or_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___getitem___functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_full_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_tensorsolve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_logsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_bool_raises_ceil_cuda_bool, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_amin_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_argmin_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_floor_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_maximum_cuda_complex64, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_minimum_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex128, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_topk_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_broadcast_to_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_contiguous_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_expand_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_expand_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_list_args_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_movedim_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_narrow_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_select_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_complex_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_eq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_take_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_geqrf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_bessel_y0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rpow___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__batch_norm_with_update_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__softmax_backward_data_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_argsort_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bool_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdist_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_conj_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_contiguous_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_corrcoef_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_no_rounding_mode_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_expand_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_eye_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_geqrf_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_histc_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_hypot_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_inner_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_int_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isclose_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_item_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_unary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kthvalue_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cond_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_householder_product_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lstsq_grad_oriented_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_qr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_tensorinv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_and_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_long_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_amin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_argmax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_std_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_multinomial_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_narrow_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_ones_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool3d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_elu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_fractional_max_pool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_group_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardswish_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_kl_div_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pairwise_distance_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_relu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nonzero_static_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ones_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reciprocal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rot90_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_amax_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_slice_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_erfcx_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtr_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_square_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_mean_unbiased_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_stft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_svd_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tril_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_true_divide_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsqueeze_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_var_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_xlogy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_atleast_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_tensorinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_reflect_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_upsample_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rand_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___rmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_alpha_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_selu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_select_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_short_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__efficient_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__softmax_backward_data_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_addmv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cfloat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_asinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_broadcast_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cummax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_item_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_embedding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_local_response_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multilabel_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_ndtri_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unique_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lstsq_grad_oriented_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_meshgrid_list_of_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_instance_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_resize__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__segment_reduce_lengths_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp__unsafe_masked_index_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addmm_decomposed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argwhere_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_div_trunc_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_lu_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_matrix_rank_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_msort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_dropout_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_upsample_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyTakeAutogradFunction_cuda_float32
2025-12-04T14:38:56.7693154Z 
2025-12-04T14:38:56.7693522Z Finished functorch/test_ops 2/7 ... [2025-12-04 14:38:56.636311][19494.246216621], took 12.08min
2025-12-04T14:38:56.7694714Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.xml
2025-12-04T14:38:56.7809992Z Running functorch/test_ops 7/7 ... [2025-12-04 14:38:56.780708][19494.390614885]
2025-12-04T14:38:56.7810543Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:38:56.7813745Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ops.py', '--shard-id=7', '--num-shards=7', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:38:56.781152]
2025-12-04T14:50:34.1726930Z 
2025-12-04T14:50:34.1727898Z functorch/test_ops 7/7 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ops_7.7_c87f7efa94ae13b4_.log
2025-12-04T14:50:34.2406387Z Running 1436 items in this shard: test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_NumpySortAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_addcdiv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_allclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_baddbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_column_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_conj_physical_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_frac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_isreal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_lu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_movedim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_dropout2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sgn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_sort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_entr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_grad_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ZeroGradientsGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cartesian_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_exp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_eye_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_fftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_irfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_float_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_hsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_celu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_pixel_unshuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_nn_functional_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_sub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_unfold_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpjvpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_addr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_aminmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_any_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_bernoulli_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_block_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_char_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_fft_rfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_gt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_hstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_jiterator_unary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_log_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logaddexp2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_max_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_min_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_new_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_max_unpool1d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_normal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_polygamma_polygamma_n_4_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_resolve_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_h_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_sqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_true_divide_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_view_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvjpvmap_SelectAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_jvpvmapvmap_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_ordered_complex_raises_clamp_cuda_complex32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_conj_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_hsplit_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_list_return_split_list_args_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_mH_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_resolve_neg_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_squeeze_multiple_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_transpose_grad_op_vjp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_unsqueeze_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_view_then_inplace_view_as_grad_op_jvp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__chunk_cat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_expm1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_isposinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_eig_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_norm_subgradients_at_zero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_linspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_lt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_masked_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nanquantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_huber_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_randn_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_roll_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_scatter_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_signal_windows_hann_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_erfcx_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i0e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_modified_bessel_i1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unbind_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_vsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rdiv___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp___rmatmul___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_angle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_atanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_byte_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_char_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_cummin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_einsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_empty_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_erfc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_ihfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_int_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_ldl_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_mv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv2d_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_conv_transpose3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_grid_sample_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_group_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_leaky_relu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_rrelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_resolve_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_round_decimals_neg_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_modified_bessel_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_special_scaled_modified_bessel_k0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_list_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_std_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvjpvmap_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap___rpow___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_bfloat16_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cholesky_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_dsplit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_erf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_ihfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_fft_rfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_isnan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_matrix_rank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_pinv_hermitian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_slogdet_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vander_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_masked_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_matmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_max_pool2d_with_indices_backward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_mm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_new_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_fractional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_relu6_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_tanhshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_nn_functional_unfold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_permute_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_scatter_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_kaiser_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_signal_windows_nuttall_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_special_legendre_polynomial_p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_split_with_sizes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_trapz_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_unflatten_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_vstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmap_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vjpvmapvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_H_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpySortAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_NumpyTakeAutogradFunction_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ScaleGradGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_T_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rmatmul___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad___rsub___cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__unsafe_masked_index_put_accumulate_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad__upsample_bilinear2d_aa_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_add_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addbmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_addmv_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_all_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_angle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_partial_views_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_asinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atan2_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cdouble_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_chalf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_copysign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_cumulative_trapezoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diag_embed_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diagonal_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_diff_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_div_floor_rounding_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_dstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exp_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_fftshift_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_hfftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_ifftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_irfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfft_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fft_rfftn_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fliplr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_flipud_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_floor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_fmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_full_like_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_gt_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_igammac_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_index_select_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_4inputs_with_extra_args_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_jiterator_binary_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_le_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_lerp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cholesky_ex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_cross_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_diagonal_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_inv_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_ldl_factor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_lu_solve_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_matrix_rank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_svdvals_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_linalg_vector_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log1p_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_not_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logical_xor_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_logspace_tensor_overload_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_cumsum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_masked_softmin_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_matmul_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_pool2d_with_indices_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_max_reduction_no_dim_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mean_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_mvlgamma_mvlgamma_p_5_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nansum_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_dropout_backward_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_new_empty_strided_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_adaptive_avg_pool1d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_groups_with_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_no_bias_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_cross_entropy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_ctc_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_grid_sample_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardshrink_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_hinge_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bicubic_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_interpolate_nearest-exact_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_layer_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_logsigmoid_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_mse_loss_functorch_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_normalize_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pad_circular_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_pixel_unshuffle_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_prelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_rms_norm_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_silu_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_softsign_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_nn_functional_triplet_margin_loss_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_norm_nuc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_in_place_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_normal_number_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pca_lowrank_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_permute_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_polygamma_polygamma_n_0_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_ravel_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_repeat_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_reshape_as_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_resize_as__cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_rsqrt_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_scatter_reduce_prod_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_searchsorted_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_select_scatter_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_short_functorch_no_channels_last_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_cosine_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signal_windows_kaiser_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_signbit_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sinh_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sparse_mm_reduce_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_bessel_j1_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_i1e_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_ndtri_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_shifted_chebyshev_polynomial_t_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_split_with_sizes_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_sum_to_size_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_take_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_transpose_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unfold_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_unsafe_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_as_complex_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_copy_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_view_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmap_autograd_grad_vstack_cuda_float64, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_argsort_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bool_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cholesky_inverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cos_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_cumulative_trapezoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_diff_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_double_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_fft_ifft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_full_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_gradient_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__batch_norm_with_update_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_arange_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_bool_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_cosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_deg2rad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_diag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_floor_rounding_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_div_no_rounding_mode_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_expand_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fft_irfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_frexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_index_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isfinite_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_2inputs_2outputs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_jiterator_binary_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_masked_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_matrix_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_native_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nextafter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_adaptive_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv2d_stride_padding_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_dropout_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_gaussian_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_max_unpool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_mse_loss_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_pad_replicate_negative_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nn_functional_softsign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_ones_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pinverse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_randint_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_short_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_signal_windows_general_hamming_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sparse_sampled_addmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_bessel_y1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_stft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_tanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_topk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_where_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cond_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_cross_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_inv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_lu_factor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_linalg_svdvals_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log10_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_max_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ne_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_new_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_groups_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv2d_stride_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_conv_transpose2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cosine_embedding_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_embedding_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_gelu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardswish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_hardtanh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_circular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_pad_constant_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nn_functional_threshold_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_nonzero_static_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_norm_inf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ones_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_polygamma_polygamma_n_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_remainder_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_resize_as__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_rsub_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_signal_windows_general_cosine_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sinc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_sparse_mm_reduce_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_hermite_polynomial_he_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_log_ndtr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_polygamma_special_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_tile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_transpose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_var_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_vdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpall_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_NumpyCubeNotComposableAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_SelectGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp___getitem___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp__native_batch_norm_legit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_alias_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_atan_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_bfloat16_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_broadcast_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clamp_max_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_clone_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_constant_pad_nd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_corrcoef_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_diagonal_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_digamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_equal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_fft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fft_irfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_floor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_fmod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_grid_sampler_2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_index_reduce_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_isclose_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_eigvalsh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_log_softmax_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_logspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_fill_functorch_Scalar_only_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_mvlgamma_mvlgamma_p_1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nanmean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_narrow_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_channel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_conv2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_embedding_bag_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_glu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_bilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_interpolate_trilinear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_normalize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_pdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_nn_functional_softplus_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_norm_fro_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ops_aten_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_randint_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_renorm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_round_decimals_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_searchsorted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signal_windows_exponential_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_slice_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_airy_ai_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_special_shifted_chebyshev_polynomial_w_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_squeeze_multiple_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_svd_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_take_along_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tensordot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_to_sparse_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_transpose_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_triangular_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_tril_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unique_consecutive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_unsafe_chunk_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_var_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_view_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvjp_zeros_like_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_NumpyCubeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapjvpvmap_SortGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_CubeGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ForwardHasDefaultArgsAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_NumpyTakeAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp__segment_reduce_offsets_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_abs_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_partial_views_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_as_strided_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_broadcast_shapes_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_bucketize_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cauchy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ceil_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_conj_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_count_nonzero_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_cov_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_erfinv_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_exp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_fft_ifftshift_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_flip_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_geometric_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_grid_sampler_3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_half_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_NumpyMulAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule___radd___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule__unsafe_masked_index_put_accumulate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_bmm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_byte_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_clamp_min_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_combinations_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_contiguous_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_dstack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_empty_permuted_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_fft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_fft_ifft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_float_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ge_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_histc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_index_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isinf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_isneginf_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_jiterator_binary_return_by_ref_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_kthvalue_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_cholesky_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_diagonal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_householder_product_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_factor_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_ldl_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_lstsq_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_multi_dot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_solve_triangular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_svd_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_linalg_vecdot_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_long_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_cumsum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_masked_select_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_multinomial_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_avg_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_adaptive_max_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_batch_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_binary_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_cross_entropy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_kl_div_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_multi_head_attention_forward_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_poisson_nll_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_silu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_soft_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softmin_with_dtype_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ormqr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polar_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_put_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_quantile_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_ravel_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_reciprocal_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_scatter_reduce_sum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_sign_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signal_windows_blackman_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_signbit_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_laguerre_polynomial_l_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_scaled_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_split_with_sizes_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_stack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_tensor_split_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_torch_ops_aten__safe_softmax_default_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_trunc_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_uniform_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_xlogy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_has_batch_rule_zero__cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_index_put_functorch_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_int_functorch_no_channels_last_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_jiterator_4inputs_with_extra_args_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_kron_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_cholesky_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_det_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_matrix_power_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_linalg_pinv_singular_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_log1p_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logcumsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logical_and_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_logspace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mH_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_argmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_logsumexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_masked_softmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_minimum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_mvlgamma_mvlgamma_p_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nan_to_num_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_neg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_adaptive_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_avg_pool2d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv2d_stride_no_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_conv_transpose1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_hardshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_area_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_interpolate_bicubic_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_max_unpool3d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_rms_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_scaled_dot_product_attention_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_smooth_l1_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_nn_functional_softshrink_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_normal_in_place_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_ops_aten__new_zeros_with_same_feature_meta_functorchonly_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_outer_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_positive_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_rad2deg_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_repeat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_reshape_as_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scalar_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_scatter_reduce_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_slice_scatter_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_bessel_j1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_modified_bessel_k1_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_shifted_chebyshev_polynomial_u_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_xlog1py_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_special_zeta_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_squeeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_t_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_trace_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_view_as_complex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjp_zeros_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rmod___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp___rsub___cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_acosh_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_add_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_addcmul_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_as_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_atleast_1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_cdist_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_clamp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diag_embed_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_diagflat_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_empty_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_expand_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_hfft_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_fft_rfft2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_gather_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_hash_tensor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_heaviside_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_i0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_igamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_index_reduce_amin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_inner_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_ldexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lgamma_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_qr_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linalg_solve_ex_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_linspace_tensor_overload_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log2_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_log_softmax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_logical_xor_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_solve_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_lu_unpack_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mT_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_argmin_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_cumprod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_logaddexp_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_masked_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_max_reduction_no_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_median_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_meshgrid_variadic_tensors_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_min_reduction_with_dim_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_mvlgamma_mvlgamma_p_5_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nanmedian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nansum_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_narrow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_native_layer_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_new_empty_strided_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_depthwise_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv2d_stride_padding_with_bias_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_conv3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_cosine_similarity_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_ctc_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_elu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_fractional_max_pool3d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_hardsigmoid_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_linear_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_interpolate_nearest_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_margin_ranking_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_pool1d_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_max_unpool2d_grad_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mish_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_mse_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_multi_margin_loss_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pad_replicate_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_nn_functional_pixel_shuffle_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_norm_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pca_lowrank_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_polygamma_polygamma_n_3_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_pow_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_prod_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_randn_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_real_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_repeat_interleave_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_reshape_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_rot90_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_scatter_reduce_amax_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_bartlett_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_signal_windows_gaussian_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_chebyshev_polynomial_t_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_shifted_chebyshev_polynomial_v_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_special_spherical_bessel_j0_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_square_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_squeeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_std_mean_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_sum_to_size_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_to_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_triu_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unbind_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_copy_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_unsqueeze_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_mean_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvjp_var_unbiased_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_MulGenVmapAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_NumpyExpMarkDirtyAutogradFunction_cuda_float32, test/functorch/test_ops.py::TestOperatorsCUDA::test_vmapvjpvmap_ScaleGradGenVmapAutogradFunction_cuda_float32
2025-12-04T14:50:34.3067721Z 
2025-12-04T14:50:34.3068084Z Finished functorch/test_ops 7/7 ... [2025-12-04 14:50:34.174764][20191.784668984], took 11.62min
2025-12-04T14:50:34.3069254Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.xml
2025-12-04T14:50:35.5274476Z Uploading artifacts took 1.17 seconds
2025-12-04T14:50:35.5278397Z Running inductor/test_max_autotune 1/1 ... [2025-12-04 14:50:35.527663][20193.137569045]
2025-12-04T14:50:35.5279008Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:50:35.5283365Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_max_autotune.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:50:35.528095]
2025-12-04T14:50:45.1431166Z 
2025-12-04T14:50:45.1432197Z inductor/test_max_autotune 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_max_autotune_1.1_dc9c21bc2c4ad5fc_.log
2025-12-04T14:50:45.1433019Z 
2025-12-04T14:50:45.1433372Z Finished inductor/test_max_autotune 1/1 ... [2025-12-04 14:50:45.142893][20202.75280313], took 0.16min
2025-12-04T14:50:45.1712884Z Running inductor/test_cpu_repro 3/3 ... [2025-12-04 14:50:45.171057][20202.7809665]
2025-12-04T14:50:45.1713440Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T14:50:45.1716878Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_repro.py', '--shard-id=3', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 14:50:45.171434]
2025-12-04T15:03:50.0355939Z 
2025-12-04T15:03:50.0357404Z inductor/test_cpu_repro 3/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_repro_3.3_41613d465af9d6d5_.log
2025-12-04T15:03:50.0531902Z Running 230 items in this shard: test/inductor/test_cpu_repro.py::CPUReproTests::test_acosh_with_negative_large_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_argmax_argmin_with_nan_value, test/inductor/test_cpu_repro.py::CPUReproTests::test_argmin, test/inductor/test_cpu_repro.py::CPUReproTests::test_atomic_add_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_bitwise_right_shift, test/inductor/test_cpu_repro.py::CPUReproTests::test_bool_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_broadcast_mul_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_cat_mul, test/inductor/test_cpu_repro.py::CPUReproTests::test_channel_shuffle_cl_output, test/inductor/test_cpu_repro.py::CPUReproTests::test_complex_memory_overlap, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv2d_bn_mixed_dtype, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_transpose2d_has_output_size_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_conv_transpose2d_packed_cpu, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_double_to_fp32_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_fp32_to_double_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int64_to_int32_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_convert_int8_to_half_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_cpu_vec_cosim, test/inductor/test_cpu_repro.py::CPUReproTests::test_decomposed_dequant_relu_quant_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_maxpool2d_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_fp8_e4m3, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_quant_lowering_fp8_e5m2, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_dequant_relu_quant_dequant_relu_quant_lowering_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_disabled_amp_is_inference_True, test/inductor/test_cpu_repro.py::CPUReproTests::test_dropout, test/inductor/test_cpu_repro.py::CPUReproTests::test_embedding_vec_bf16, test/inductor/test_cpu_repro.py::CPUReproTests::test_expr_vec_non_contiguous, test/inductor/test_cpu_repro.py::CPUReproTests::test_float32_to_uint8, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp32_load_with_to_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_15,3,13, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_bfloat16_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_fp8_cast_float32_shape_4,2048,4096, test/inductor/test_cpu_repro.py::CPUReproTests::test_fractional_max_pool2d_3d_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_full_boolean_dynamic_shape, test/inductor/test_cpu_repro.py::CPUReproTests::test_fused_attention_conv, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_large_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_large_size, test/inductor/test_cpu_repro.py::CPUReproTests::test_group_norm_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_highp_to_lowp_cse_var_cache_with_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_horizontal_fusion, test/inductor/test_cpu_repro.py::CPUReproTests::test_index_put2, test/inductor/test_cpu_repro.py::CPUReproTests::test_int64_pointwise_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_int_div_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_invalid_dropout_args, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_buffer_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_float64, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_packed, test/inductor/test_cpu_repro.py::CPUReproTests::test_linear_with_no_default_contiguous_input, test/inductor/test_cpu_repro.py::CPUReproTests::test_local_buffer_with_line_reuse, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_False_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_1_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_False_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_False_empty_state_True_batch_first_True_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_1_bidirectional_True_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_False_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_False_batch_size_7_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_False_bias_True_empty_state_True_batch_first_True_batch_size_7_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_False_empty_state_False_batch_first_False_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_False_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_False_batch_first_True_batch_size_1_seq_len_1, test/inductor/test_cpu_repro.py::CPUReproTests::test_lstm_packed_unbatched_True_input_size_7_hidden_size_7_num_layers_7_bidirectional_True_bias_True_empty_state_True_batch_first_True_batch_size_1_seq_len_7, test/inductor/test_cpu_repro.py::CPUReproTests::test_max_reduction_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_meta_device, test/inductor/test_cpu_repro.py::CPUReproTests::test_module_buffer_mutation, test/inductor/test_cpu_repro.py::CPUReproTests::test_new_vec_op_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_nn_fold, test/inductor/test_cpu_repro.py::CPUReproTests::test_non_contiguous_reduction_store, test/inductor/test_cpu_repro.py::CPUReproTests::test_outer_looop_fusion_with_local_buf, test/inductor/test_cpu_repro.py::CPUReproTests::test_pow_cos, test/inductor/test_cpu_repro.py::CPUReproTests::test_relu_permute_reshape_reinterpret_view, test/inductor/test_cpu_repro.py::CPUReproTests::test_repeated_exp, test/inductor/test_cpu_repro.py::CPUReproTests::test_require_stride_order_non_owning, test/inductor/test_cpu_repro.py::CPUReproTests::test_scatter_using_atomic_add_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_set_source_Tensor, test/inductor/test_cpu_repro.py::CPUReproTests::test_sigmoid_with_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_symbolic_shape_scalar_value_reduction, test/inductor/test_cpu_repro.py::CPUReproTests::test_tanh_atan2, test/inductor/test_cpu_repro.py::CPUReproTests::test_tanh_atan2_use_decompose_tanh, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_load_decomposed_dequant_add_relu_quant_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_tile2d_store_channel_shuffle_cl_quant_output_int8, test/inductor/test_cpu_repro.py::CPUReproTests::test_to_dtype_bool_float, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_mxn_32_32_bf16_fp16, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_sum_outer, test/inductor/test_cpu_repro.py::CPUReproTests::test_transpose_with_norm, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint32_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint64_reduction_vec, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_add, test/inductor/test_cpu_repro.py::CPUReproTests::test_uint8_sub, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_contiguous_ModularIndexing, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_kernel_cpu_only, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_remainder, test/inductor/test_cpu_repro.py::CPUReproTests::test_vec_transpose_lowp_fp, test/inductor/test_cpu_repro.py::CPUReproTests::test_vertical_sum_cpu_only
2025-12-04T15:03:50.0702239Z 
2025-12-04T15:03:50.0702626Z Finished inductor/test_cpu_repro 3/3 ... [2025-12-04 15:03:50.036011][20987.64591466], took 13.08min
2025-12-04T15:03:50.0703863Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.xml
2025-12-04T15:03:50.1745435Z Running inductor/test_mkldnn_pattern_matcher 2/3 ... [2025-12-04 15:03:50.174248][20987.784155877]
2025-12-04T15:03:50.1746059Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:03:50.1749443Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_mkldnn_pattern_matcher.py', '--shard-id=2', '--num-shards=3', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:03:50.174672]
2025-12-04T15:10:35.2469615Z 
2025-12-04T15:10:35.2471893Z inductor/test_mkldnn_pattern_matcher 2/3 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_mkldnn_pattern_matcher_2.3_52e8559de495a0be_.log
2025-12-04T15:10:35.2542357Z Running 99 items in this shard: test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_conv2d_binary_inplace_fusion_pass_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_False_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_False_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_False_M_32_inplace_add_True_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_bfloat16_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_False_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_False_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_False_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_1_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_False_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_1_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_False_expand_a_scale_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_da8w8_sym_act_sym_wgt_with_int_mm_has_bias_True_float32_dynamic_True_reshape_a_True_M_32_inplace_add_True_expand_a_scale_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_dynamic_qlinear_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_hardtanh_pattern_fallback, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_binary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_dynamic_fp16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_linear_unary, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_multi_linear_share_same_input, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_add, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_hardtanh, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qat_qconv2d_relu6, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_3, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_add_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardswish_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_hardtanh_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_int8_mixed_bf16_use_autocast, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_relu6_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_silu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qconv2d_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_cpu_use_relu_True_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_False_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_True_is_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_add_int8_mixed_bf16_use_relu_True_is_qat_True_is_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_dequant_promotion_int8_mixed_bf16_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_fp8_inductor_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_gelu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_input_dim_exceeds_2, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_and_not_contiguous, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_int8_mixed_bf16_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_mul_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_int8_mixed_bf16_input_dim_exceeds_2_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_relu_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_sum_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qlinear_xpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_qmaxpool2d, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_reproduce_121253_issue_addmm_fusion_check, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_bfloat16_per_channel_quant_True_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_False_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_bfloat16_per_channel_quant_False_dynamic_False, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_smooth_quant_with_int_mm_has_bias_True_float32_per_channel_quant_False_dynamic_True, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int4_cpu, test/inductor/test_mkldnn_pattern_matcher.py::TestPatternMatcher::test_woq_int8, test/inductor/test_mkldnn_pattern_matcher.py::TestDynamicPatternMatcher::test_q_attention_block
2025-12-04T15:10:35.2610033Z 
2025-12-04T15:10:35.2610453Z Finished inductor/test_mkldnn_pattern_matcher 2/3 ... [2025-12-04 15:10:35.246932][21392.856840636], took 6.75min
2025-12-04T15:10:35.2756013Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.xml
2025-12-04T15:10:35.3740342Z Running inductor/test_cpu_select_algorithm 1/1 ... [2025-12-04 15:10:35.373650][21392.983558377]
2025-12-04T15:10:35.3741013Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:10:35.3743713Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpu_select_algorithm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:35.374076]
2025-12-04T15:10:47.5657682Z 
2025-12-04T15:10:47.5658762Z inductor/test_cpu_select_algorithm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpu_select_algorithm_1.1_2b85f4e0fd3f066c_.log
2025-12-04T15:10:47.5659818Z Running 0 items in this shard:
2025-12-04T15:10:47.5660029Z 
2025-12-04T15:10:47.5660418Z Finished inductor/test_cpu_select_algorithm 1/1 ... [2025-12-04 15:10:47.565542][21405.17545116], took 0.20min
2025-12-04T15:10:47.5942417Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.xml
2025-12-04T15:10:48.8886576Z Uploading artifacts took 1.22 seconds
2025-12-04T15:10:48.8890986Z Running test_custom_ops 1/1 ... [2025-12-04 15:10:48.888875][21406.498782603]
2025-12-04T15:10:48.8891594Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:10:48.8895503Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_custom_ops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:10:48.889311]
2025-12-04T15:11:31.7664530Z 
2025-12-04T15:11:31.7667693Z test_custom_ops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_custom_ops_1.1_37d60717605e8cfe_.log
2025-12-04T15:11:31.7770181Z Running 282 items in this shard: test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_abstract_impl_on_existing_op_with_meta, test/test_custom_ops.py::TestCustomOp::test_autogen_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_autograd_function_backed_op, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented, test/test_custom_ops.py::TestCustomOp::test_autograd_notimplemented_gradmode, test/test_custom_ops.py::TestCustomOp::test_backward_dict_grad_for_nontensor, test/test_custom_ops.py::TestCustomOp::test_backward_dict_invalid_keys, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_optional_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_dict_requires_keys_for_input_tensors, test/test_custom_ops.py::TestCustomOp::test_backward_grads_are_tensor_or_none, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_mutable, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_no_output, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_incorrect_schema_views, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_Autograd, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCPU, test/test_custom_ops.py::TestCustomOp::test_backward_impl_on_existing_op_with_key_key_AutogradCUDA, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_non_tensor, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_numel, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_tensorlist, test/test_custom_ops.py::TestCustomOp::test_backward_output_differentiability_type, test/test_custom_ops.py::TestCustomOp::test_backward_partially_registered, test/test_custom_ops.py::TestCustomOp::test_backward_returns_dict, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_none_or_Tensor, test/test_custom_ops.py::TestCustomOp::test_backward_tensorlist_input_requires_list_grads_with_same_numel, test/test_custom_ops.py::TestCustomOp::test_basic_make_fx, test/test_custom_ops.py::TestCustomOp::test_builtin_aten_ops_are_pt2_compliant, test/test_custom_ops.py::TestCustomOp::test_builtin_torchscript_ops, test/test_custom_ops.py::TestCustomOp::test_data_dependent_basic, test/test_custom_ops.py::TestCustomOp::test_data_dependent_compile, test/test_custom_ops.py::TestCustomOp::test_data_dependent_fake_tracing, test/test_custom_ops.py::TestCustomOp::test_data_dependent_nms_dynamic_compile, test/test_custom_ops.py::TestCustomOp::test_define_and_impl, test/test_custom_ops.py::TestCustomOp::test_define_bad_schema, test/test_custom_ops.py::TestCustomOp::test_define_validation, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_list, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_single, test/test_custom_ops.py::TestCustomOp::test_define_with_tags_tuple, test/test_custom_ops.py::TestCustomOp::test_defined_in_python, test/test_custom_ops.py::TestCustomOp::test_duplicate_impl, test/test_custom_ops.py::TestCustomOp::test_functionalize_error, test/test_custom_ops.py::TestCustomOp::test_impl_abstract_overload, test/test_custom_ops.py::TestCustomOp::test_impl_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cpu, test/test_custom_ops.py::TestCustomOp::test_impl_device_cuda, test/test_custom_ops.py::TestCustomOp::test_impl_device_function, test/test_custom_ops.py::TestCustomOp::test_impl_device_invalid, test/test_custom_ops.py::TestCustomOp::test_impl_function, test/test_custom_ops.py::TestCustomOp::test_impl_invalid_devices, test/test_custom_ops.py::TestCustomOp::test_impl_meta, test/test_custom_ops.py::TestCustomOp::test_impl_multiple, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CPU, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CUDA, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeExplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_on_existing_op_with_cpu_registration_key_CompositeImplicitAutograd, test/test_custom_ops.py::TestCustomOp::test_impl_separate, test/test_custom_ops.py::TestCustomOp::test_incorrect_schema_types, test/test_custom_ops.py::TestCustomOp::test_infer_schema_no_return, test/test_custom_ops.py::TestCustomOp::test_infer_schema_supported, test/test_custom_ops.py::TestCustomOp::test_infer_schema_unsupported, test/test_custom_ops.py::TestCustomOp::test_invalid_qualname, test/test_custom_ops.py::TestCustomOp::test_invalid_schemas, test/test_custom_ops.py::TestCustomOp::test_is_functional_schema, test/test_custom_ops.py::TestCustomOp::test_is_tensorlist_like_type, test/test_custom_ops.py::TestCustomOp::test_legacy_define, test/test_custom_ops.py::TestCustomOp::test_legacy_impl, test/test_custom_ops.py::TestCustomOp::test_lifetime, test/test_custom_ops.py::TestCustomOp::test_load_library, test/test_custom_ops.py::TestCustomOp::test_meta_for_data_dependent_shape_operation, test/test_custom_ops.py::TestCustomOp::test_name_must_match, test/test_custom_ops.py::TestCustomOp::test_new_data_dependent_symint, test/test_custom_ops.py::TestCustomOp::test_not_implemented_error, test/test_custom_ops.py::TestCustomOp::test_override_cea, test/test_custom_ops.py::TestCustomOp::test_override_fake, test/test_custom_ops.py::TestCustomOp::test_override_impl, test/test_custom_ops.py::TestCustomOp::test_override_meta, test/test_custom_ops.py::TestCustomOp::test_private_ctor, test/test_custom_ops.py::TestCustomOp::test_reserved_ns, test/test_custom_ops.py::TestCustomOp::test_resolve_packet, test/test_custom_ops.py::TestCustomOp::test_save_for_backward_inputs_are_namedtuple, test/test_custom_ops.py::TestCustomOp::test_schema_matches_signature, test/test_custom_ops.py::TestCustomOp::test_sequences, test/test_custom_ops.py::TestCustomOp::test_supported_param_types, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_multi_return, test/test_custom_ops.py::TestCustomOp::test_supported_return_types_single_return, test/test_custom_ops.py::TestCustomOp::test_supported_schemas, test/test_custom_ops.py::TestCustomOp::test_symints, test/test_custom_ops.py::TestCustomOp::test_unsupported_param_types, test/test_custom_ops.py::TestCustomOp::test_unsupported_schemas, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_dynamic__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_inplace, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_aot_dispatch_static__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_inplace, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_autograd_registration__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_dont_generate, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_inplace, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_faketensor__test_nonzero, test/test_custom_ops.py::MiniOpTest::test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_inplace, test/test_custom_ops.py::MiniOpTest::test_mm, test/test_custom_ops.py::MiniOpTest::test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error, test/test_custom_ops.py::MiniOpTest::test_schema__test_delayed_error_no_requires_grad, test/test_custom_ops.py::MiniOpTest::test_schema__test_incorrect_schema, test/test_custom_ops.py::MiniOpTest::test_schema__test_inplace, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_errors, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_fake, test/test_custom_ops.py::MiniOpTest::test_schema__test_mm_meta, test/test_custom_ops.py::MiniOpTest::test_schema__test_no_abstract, test/test_custom_ops.py::MiniOpTest::test_schema__test_nonzero, test/test_custom_ops.py::TestCustomOpAPI::test_any_output_is_alias_to_input_or_output, test/test_custom_ops.py::TestCustomOpAPI::test_any_requires_grad, test/test_custom_ops.py::TestCustomOpAPI::test_basic, test/test_custom_ops.py::TestCustomOpAPI::test_compile, test/test_custom_ops.py::TestCustomOpAPI::test_default_values, test/test_custom_ops.py::TestCustomOpAPI::test_disallows_output_aliasing, test/test_custom_ops.py::TestCustomOpAPI::test_factory_function, test/test_custom_ops.py::TestCustomOpAPI::test_fake, test/test_custom_ops.py::TestCustomOpAPI::test_kwarg_only_tensors, test/test_custom_ops.py::TestCustomOpAPI::test_layout_constraint_tags, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_invalid, test/test_custom_ops.py::TestCustomOpAPI::test_library_get_kernel_with_conditional_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_list_input, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autocast_multiple_times_different_devices, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_autograd_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_0, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_1, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_3, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_4, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_fake_source_idx_5, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_kernel_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_mode, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_torch_dispatch_rule_subclass, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_library_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_op_decorator, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times, test/test_custom_ops.py::TestCustomOpAPI::test_library_register_vmap_register_multiple_times_2, test/test_custom_ops.py::TestCustomOpAPI::test_library_schema_infer, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema, test/test_custom_ops.py::TestCustomOpAPI::test_manual_schema_error, test/test_custom_ops.py::TestCustomOpAPI::test_multi_types, test/test_custom_ops.py::TestCustomOpAPI::test_mutated, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_error, test/test_custom_ops.py::TestCustomOpAPI::test_mutated_unknown, test/test_custom_ops.py::TestCustomOpAPI::test_no_grad_skips_autograd, test/test_custom_ops.py::TestCustomOpAPI::test_overloading, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_error_cases, test/test_custom_ops.py::TestCustomOpAPI::test_register_autograd_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_defaults, test/test_custom_ops.py::TestCustomOpAPI::test_register_vmap_kwargonly_low_level, test/test_custom_ops.py::TestCustomOpAPI::test_replacement, test/test_custom_ops.py::TestCustomOpAPI::test_set_kernel_enabled, test/test_custom_ops.py::TestCustomOpAPI::test_split_device, test/test_custom_ops.py::TestCustomOpAPI::test_subclass_accessor_view, test/test_custom_ops.py::TestCustomOpAPI::test_subclass_accessor_view_error, test/test_custom_ops.py::TestCustomOpAPI::test_supports_tensorlist, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_dynamic__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_aot_dispatch_static__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_autograd_registration__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_faketensor__test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_nonzero_again, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_mm, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_nonzero, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_aten_sin_, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_delayed_error, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_incorrect_schema, test/test_custom_ops.py::MiniOpTestOther::test_pt2_compliant_tag_mini_op_test_no_abstract, test/test_custom_ops.py::MiniOpTestOther::test_schema__test_nonzero_again, test/test_custom_ops.py::TestGenerateOpcheckTests::test_MiniOpTest, test/test_custom_ops.py::TestGenerateOpcheckTests::test_dont_generate_decorator, test/test_custom_ops.py::TestGenerateOpcheckTests::test_failures_dict_validation, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_no_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_generate_repro_save_data, test/test_custom_ops.py::TestGenerateOpcheckTests::test_is_inside_opcheck_mode, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_bad_op, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_customopdef, test/test_custom_ops.py::TestGenerateOpcheckTests::test_opcheck_does_not_require_extra_deps, test/test_custom_ops.py::TestTypeConversion::test_mixed_types, test/test_custom_ops.py::TestTypeConversion::test_optional, test/test_custom_ops.py::TestTypeConversion::test_simple_tuple, test/test_custom_ops.py::TestTypeConversion::test_supported_types, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_custom_op, test/test_custom_ops.py::TestOpProfiles::test_duplicate_registration_impl, test/test_custom_ops.py::TestOpProfiles::test_fake_registration, test/test_custom_ops.py::TestOpProfiles::test_save_to_file, test/test_custom_ops.py::TestOpProfiles::test_version, test/test_custom_ops.py::TestOpProfiles::test_yaml, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_False_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_False_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_aot_autograd_check_degenerate_cases_check_gradients_auto_dynamic_True_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_assert_raises_regex_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registered_at_backend_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_autograd_kernel_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_compositeimplicitautograd_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_composite_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_autograd_registration_check_incorrect_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_global_state_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_mutation_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_incorrect_schema_view_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_abstract_impl_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_missing_functionalization_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_fails_basic_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCatCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyCubeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyMulScalarCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNMSCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyNonzeroCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySortCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyTakeCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_opinfo_NumpyViewCopyCustomOp_cuda_float32, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_opcheck_unbacked_stride_cuda, test/test_custom_ops.py::TestCustomOpTestingCUDA::test_single_element_tuple_output_cuda
2025-12-04T15:11:31.7871366Z 
2025-12-04T15:11:31.7871770Z Finished test_custom_ops 1/1 ... [2025-12-04 15:11:31.766701][21449.376607372], took 0.71min
2025-12-04T15:11:31.7958681Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.xml
2025-12-04T15:11:31.8810246Z Running inductor/test_analysis 1/1 ... [2025-12-04 15:11:31.880722][21449.490628642]
2025-12-04T15:11:31.8810877Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:11:31.8813784Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_analysis.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:31.881139]
2025-12-04T15:11:44.1135055Z 
2025-12-04T15:11:44.1136046Z inductor/test_analysis 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_analysis_1.1_a128307487ad43a3_.log
2025-12-04T15:11:44.1149835Z Running 28 items in this shard: test/inductor/test_analysis.py::TestUtils::test_tabulate2d, test/inductor/test_analysis.py::TestUtils::test_zip_dicts, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_against_flop_counter_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_augment_trace_helper_unit_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_combine_profiles_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_diff_cuda_float64, test/inductor/test_analysis.py::TestAnalysisCUDA::test_noop_cuda, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat1_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat2_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_pointwise_bandwidth_maxat3_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float16, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float32, test/inductor/test_analysis.py::TestAnalysisCUDA::test_triton_has_metadata_maxat0_cuda_float64
2025-12-04T15:11:44.1163166Z 
2025-12-04T15:11:44.1163537Z Finished inductor/test_analysis 1/1 ... [2025-12-04 15:11:44.113307][21461.72321616], took 0.20min
2025-12-04T15:11:44.1422820Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.xml
2025-12-04T15:11:44.2183817Z Running inductor/test_pad_mm 1/1 ... [2025-12-04 15:11:44.218067][21461.827973701]
2025-12-04T15:11:44.2184381Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:11:44.2187188Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_pad_mm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:44.218473]
2025-12-04T15:11:54.2976612Z 
2025-12-04T15:11:54.2977592Z inductor/test_pad_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_pad_mm_1.1_bfb512e8053e306d_.log
2025-12-04T15:11:54.2983636Z Running 19 items in this shard: test/inductor/test_pad_mm.py::PadMMTest::test_cat_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_cat_padding, test/inductor/test_pad_mm.py::PadMMTest::test_exclude_padding, test/inductor/test_pad_mm.py::PadMMTest::test_no_autocast_in_pad_bmm_joint_graph_pass, test/inductor/test_pad_mm.py::PadMMTest::test_original_aten_preserved_pad_mm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_2d_bias, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_addmm_dyn_mn, test/inductor/test_pad_mm.py::PadMMTest::test_pad_batch, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_b, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_bm, test/inductor/test_pad_mm.py::PadMMTest::test_pad_bmm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_bf16, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_k, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_m, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_mnk, test/inductor/test_pad_mm.py::PadMMTest::test_pad_mm_dyn_n, test/inductor/test_pad_mm.py::PadMMTest::test_pad_single_cat, test/inductor/test_pad_mm.py::PadMMTest::test_zero_dim
2025-12-04T15:11:54.2989286Z 
2025-12-04T15:11:54.2989599Z Finished inductor/test_pad_mm 1/1 ... [2025-12-04 15:11:54.297439][21471.907348482], took 0.17min
2025-12-04T15:11:54.3264352Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.xml
2025-12-04T15:11:54.4044660Z Running inductor/test_triton_syntax 1/1 ... [2025-12-04 15:11:54.404069][21472.013976541]
2025-12-04T15:11:54.4045256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:11:54.4047844Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_syntax.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:11:54.404542]
2025-12-04T15:12:15.2499910Z 
2025-12-04T15:12:15.2501555Z inductor/test_triton_syntax 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_syntax_1.1_cd6b570d7971cca9_.log
2025-12-04T15:12:15.2502928Z Running 1 items in this shard: test/inductor/test_triton_syntax.py::TestTritonSyntacticallyValid::test_triton_sqrt
2025-12-04T15:12:15.2503514Z 
2025-12-04T15:12:15.2503890Z Finished inductor/test_triton_syntax 1/1 ... [2025-12-04 15:12:15.249753][21492.859660579], took 0.35min
2025-12-04T15:12:15.2792852Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.xml
2025-12-04T15:12:15.3582110Z Running inductor/test_triton_extension_backend 1/1 ... [2025-12-04 15:12:15.357834][21492.967741898]
2025-12-04T15:12:15.3582762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:12:15.3585405Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_triton_extension_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:15.358271]
2025-12-04T15:12:27.4791661Z 
2025-12-04T15:12:27.4793048Z inductor/test_triton_extension_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_triton_extension_backend_1.1_e218feea67d6cd2a_.log
2025-12-04T15:12:27.4794134Z Running 0 items in this shard:
2025-12-04T15:12:27.4794360Z 
2025-12-04T15:12:27.4794780Z Finished inductor/test_triton_extension_backend 1/1 ... [2025-12-04 15:12:27.478949][21505.088858698], took 0.20min
2025-12-04T15:12:27.5078133Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.xml
2025-12-04T15:12:27.5760378Z Running test_sparse_semi_structured 1/1 ... [2025-12-04 15:12:27.575708][21505.18561558]
2025-12-04T15:12:27.5760963Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:12:27.5764282Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_sparse_semi_structured.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:27.576140]
2025-12-04T15:12:37.7557454Z 
2025-12-04T15:12:37.7558410Z test_sparse_semi_structured 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_sparse_semi_structured_1.1_4dd53f61ed651a5b_.log
2025-12-04T15:12:37.7580189Z Running 42 items in this shard: test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cusparselt, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_mlp_contiguous_relu_compile_cutlass, test/test_sparse_semi_structured.py::SparseSemiStructuredTensorCompileTest::test_sp24_compile, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_indices, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_linear, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_min_sparse_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mlp, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_first_TN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NN, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_mm_sparse_second_NT, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_to_sparse_semi_structured, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dim, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_unsupported_shape, test/test_sparse_semi_structured.py::TestSparseSemiStructured::test_values, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_gemm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_edge_case1, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_id, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pack_both_ways_meta_correctness, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_prune_dense_static_sort, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_pruning_algo_largest_abs_values_greedy, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_apply_dense, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_bmm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredTraining::test_sp24_matmuls_mat_vec, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_conversions_all_patterns, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_linear_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUTLASS::test_sparse_semi_structured_ops_cutlass, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_compile_autotune, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_alpha_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_mixed_dtype, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_csrc_cslt_sparse_mm_search, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_cusparselt_backend, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_fp8fp8_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm, test/test_sparse_semi_structured.py::TestSparseSemiStructuredCUSPARSELT::test_sparse_semi_structured_scaled_mm_fp8
2025-12-04T15:12:37.7600688Z 
2025-12-04T15:12:37.7601248Z Finished test_sparse_semi_structured 1/1 ... [2025-12-04 15:12:37.755555][21515.365463772], took 0.17min
2025-12-04T15:12:37.7848778Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.xml
2025-12-04T15:12:37.8632297Z Running inductor/test_op_completeness 1/1 ... [2025-12-04 15:12:37.862908][21515.472815722]
2025-12-04T15:12:37.8632903Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:12:37.8636115Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_op_completeness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:37.863340]
2025-12-04T15:12:43.7362841Z 
2025-12-04T15:12:43.7364684Z inductor/test_op_completeness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_op_completeness_1.1_5deb9907383c3460_.log
2025-12-04T15:12:43.7370864Z Running 5 items in this shard: test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_cpp_vec_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_halide_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_metal_overrides, test/inductor/test_op_completeness.py::TestOpCompleteness::test_triton_overrides
2025-12-04T15:12:43.7375507Z 
2025-12-04T15:12:43.7376247Z Finished inductor/test_op_completeness 1/1 ... [2025-12-04 15:12:43.736053][21521.345961931], took 0.10min
2025-12-04T15:12:43.7662116Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.xml
2025-12-04T15:12:43.7979965Z Running inductor/test_subgraph_choice 1/1 ... [2025-12-04 15:12:43.797628][21521.407536432]
2025-12-04T15:12:43.7980922Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:12:43.7984447Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_subgraph_choice.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:12:43.798090]
2025-12-04T15:13:02.9413754Z 
2025-12-04T15:13:02.9415320Z inductor/test_subgraph_choice 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_subgraph_choice_1.1_927735b69ebf1973_.log
2025-12-04T15:13:02.9417345Z Running 2 items in this shard: test/inductor/test_subgraph_choice.py::TestSubgraphChoice::test_subgraph_decompose_k, test/inductor/test_subgraph_choice.py::TestSubgraphChoice::test_subgraph_freeze_layout
2025-12-04T15:13:02.9418388Z 
2025-12-04T15:13:02.9418760Z Finished inductor/test_subgraph_choice 1/1 ... [2025-12-04 15:13:02.941149][21540.551056066], took 0.32min
2025-12-04T15:13:02.9709384Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.xml
2025-12-04T15:13:03.0640275Z Running inductor/test_cutedsl_grouped_mm 1/1 ... [2025-12-04 15:13:03.063641][21540.673547417]
2025-12-04T15:13:03.0640982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:13:03.0644182Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cutedsl_grouped_mm.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:03.064087]
2025-12-04T15:13:08.3364690Z 
2025-12-04T15:13:08.3365964Z inductor/test_cutedsl_grouped_mm 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cutedsl_grouped_mm_1.1_4f25a6335f622148_.log
2025-12-04T15:13:08.3382049Z Running 24 items in this shard: test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_contiguous_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_contiguous_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_offset_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_offset_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_padded_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_padded_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_view_layout_B_broadcasted, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_assorted_layouts_layout_A_view_layout_B_contiguous, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_1024_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_2_M_hint_256_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_1024_K_64_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_128_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_128_N_256, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_64_N_128, test/inductor/test_cutedsl_grouped_mm.py::TestCuTeDSLGroupedGemm::test_grouped_gemm_basic_group_size_8_M_hint_256_K_64_N_256
2025-12-04T15:13:08.3396925Z 
2025-12-04T15:13:08.3397322Z Finished inductor/test_cutedsl_grouped_mm 1/1 ... [2025-12-04 15:13:08.336299][21545.946206704], took 0.09min
2025-12-04T15:13:08.3662274Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.xml
2025-12-04T15:13:08.4032645Z Running inductor/test_cpp_wrapper_hipify 1/1 ... [2025-12-04 15:13:08.402922][21546.012828598]
2025-12-04T15:13:08.4033242Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:13:08.4036038Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cpp_wrapper_hipify.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:08.403335]
2025-12-04T15:13:14.7269544Z 
2025-12-04T15:13:14.7270692Z inductor/test_cpp_wrapper_hipify 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cpp_wrapper_hipify_1.1_353d02c262482f20_.log
2025-12-04T15:13:14.7273063Z Running 3 items in this shard: test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_aoti_driver_header, test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_basic_declaration, test/inductor/test_cpp_wrapper_hipify.py::TestCppWrapperHipify::test_hipify_cross_platform
2025-12-04T15:13:14.7274643Z 
2025-12-04T15:13:14.7275028Z Finished inductor/test_cpp_wrapper_hipify 1/1 ... [2025-12-04 15:13:14.726724][21552.336633601], took 0.11min
2025-12-04T15:13:14.7565948Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.xml
2025-12-04T15:13:14.8470771Z Running inductor/test_inductor_utils 1/1 ... [2025-12-04 15:13:14.846740][21552.456647425]
2025-12-04T15:13:14.8471369Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:13:14.8477202Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:14.847159]
﻿2025-12-04T15:13:23.0734019Z 
2025-12-04T15:13:23.0735174Z inductor/test_inductor_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_utils_1.1_67afa62609840b86_.log
2025-12-04T15:13:23.0736780Z Running 2 items in this shard: test/inductor/test_inductor_utils.py::TestBench::test_benchmarker, test/inductor/test_inductor_utils.py::TestBench::test_do_bench_using_profiling
2025-12-04T15:13:23.0737676Z 
2025-12-04T15:13:23.0738053Z Finished inductor/test_inductor_utils 1/1 ... [2025-12-04 15:13:23.073169][21560.68307914], took 0.14min
2025-12-04T15:13:23.1029165Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.xml
2025-12-04T15:13:23.1849301Z Running inductor/test_template_heuristics_registry 1/1 ... [2025-12-04 15:13:23.184630][21560.79453802]
2025-12-04T15:13:23.1849972Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:13:23.1853381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_template_heuristics_registry.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:23.185064]
2025-12-04T15:13:29.6586708Z 
2025-12-04T15:13:29.6588165Z inductor/test_template_heuristics_registry 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_template_heuristics_registry_1.1_3f598775c056439a_.log
2025-12-04T15:13:29.6592079Z Running 5 items in this shard: test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_assertion_existing_class, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_fallback_behavior, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_hierarchy_lookup, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_partial_hierarchy_scenarios, test/inductor/test_template_heuristics_registry.py::TestTemplateHeuristicsRegistry::test_register_class
2025-12-04T15:13:29.6595121Z 
2025-12-04T15:13:29.6595561Z Finished inductor/test_template_heuristics_registry 1/1 ... [2025-12-04 15:13:29.658452][21567.268361704], took 0.11min
2025-12-04T15:13:29.6882016Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.xml
2025-12-04T15:13:29.7660004Z Running inductor/test_async_compile 1/1 ... [2025-12-04 15:13:29.765646][21567.375552441]
2025-12-04T15:13:29.7660598Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:13:29.7663345Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_async_compile.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:13:29.766075]
2025-12-04T15:14:46.0425306Z 
2025-12-04T15:14:46.0426439Z inductor/test_async_compile 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_async_compile_1.1_887cb91e60faea2f_.log
2025-12-04T15:14:46.0430639Z Running 8 items in this shard: test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_autotune_lookup_table_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_bad_kernel, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_fork, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_spawn, test/inductor/test_async_compile.py::TestAsyncCompile::test_pool_method_subprocess, test/inductor/test_async_compile.py::TestAsyncCompile::test_wait_pool_ready
2025-12-04T15:14:46.0434448Z 
2025-12-04T15:14:46.0434938Z Finished inductor/test_async_compile 1/1 ... [2025-12-04 15:14:46.042313][21643.652221601], took 1.27min
2025-12-04T15:14:46.0729963Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.xml
2025-12-04T15:14:46.1575991Z Running dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 15:14:46.157250][21643.767157403]
2025-12-04T15:14:46.1576605Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:14:46.1579623Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_deque_reconstruct.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:14:46.157703]
2025-12-04T15:14:53.8833547Z 
2025-12-04T15:14:53.8834650Z dynamo/test_deque_reconstruct 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_deque_reconstruct_1.1_f8b7d34594077ea6_.log
2025-12-04T15:14:53.8837061Z Running 3 items in this shard: test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_not_in_globals, test/dynamo/test_deque_reconstruct.py::TestDequeReconstruct::test_deque_reconstruct_shallows_globals
2025-12-04T15:14:53.8838722Z 
2025-12-04T15:14:53.8839341Z Finished dynamo/test_deque_reconstruct 1/1 ... [2025-12-04 15:14:53.883136][21651.493046313], took 0.13min
2025-12-04T15:14:53.9134014Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.xml
2025-12-04T15:14:53.9885084Z Running inductor/test_utils 1/1 ... [2025-12-04 15:14:53.988222][21651.598120994]
2025-12-04T15:14:53.9885611Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:14:53.9888774Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:14:53.988653]
2025-12-04T15:15:01.1133872Z 
2025-12-04T15:15:01.1134847Z inductor/test_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_utils_1.1_63e5e2174acc542d_.log
2025-12-04T15:15:01.1138059Z Running 7 items in this shard: test/inductor/test_utils.py::TestUtilsCUDA::testSympySubs_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_flops_fx_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_bfloat16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float16, test/inductor/test_utils.py::TestUtilsCUDA::test_get_device_tflops_cuda_float32, test/inductor/test_utils.py::TestUtilsCUDA::test_sympy_str_cuda, test/inductor/test_utils.py::TestUtilsCUDA::test_zip_schema_cuda
2025-12-04T15:15:01.1140626Z 
2025-12-04T15:15:01.1140952Z Finished inductor/test_utils 1/1 ... [2025-12-04 15:15:01.113183][21658.72309247], took 0.12min
2025-12-04T15:15:01.1437458Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.xml
2025-12-04T15:15:01.2442179Z Running inductor/test_indexing 1/1 ... [2025-12-04 15:15:01.243921][21658.853828365]
2025-12-04T15:15:01.2442771Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:15:01.2445966Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:01.244361]
2025-12-04T15:15:20.7877527Z 
2025-12-04T15:15:20.7878526Z inductor/test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_indexing_1.1_2bd025888cab1cf8_.log
2025-12-04T15:15:20.7888640Z Running 22 items in this shard: test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_applied, test/inductor/test_indexing.py::TestIndexingSimplification::test_expand_floor_div_skipped, test/inductor/test_indexing.py::TestIndexingSimplification::test_floordiv_div_sympy_is_integer_bug, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_join, test/inductor/test_indexing.py::TestIndexingSimplification::test_indexing_simplification, test/inductor/test_indexing.py::TestIndexingSimplification::test_int8_unpack, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_pairs_not_merged, test/inductor/test_indexing.py::TestIndexingSimplification::test_modular_indexing_positive, test/inductor/test_indexing.py::ExprPrinterTests::test_print_Min_Max, test/inductor/test_indexing.py::ExprPrinterTests::test_print_ceil, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor, test/inductor/test_indexing.py::ExprPrinterTests::test_print_floor_div, test/inductor/test_indexing.py::ExprPrinterTests::test_print_integer, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_mod_index, test/inductor/test_indexing.py::ExprPrinterTests::test_print_pow, test/inductor/test_indexing.py::ExprPrinterTests::test_print_python_mod, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_-1, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_0, test/inductor/test_indexing.py::ExprPrinterTests::test_print_round_decimal_ndigits_1
2025-12-04T15:15:20.7897785Z 
2025-12-04T15:15:20.7898124Z Finished inductor/test_indexing 1/1 ... [2025-12-04 15:15:20.787556][21678.397464601], took 0.33min
2025-12-04T15:15:20.8184358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.xml
2025-12-04T15:15:20.9661624Z Running inductor/test_inductor_annotations 1/1 ... [2025-12-04 15:15:20.965836][21678.575743171]
2025-12-04T15:15:20.9662256Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:15:20.9665307Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inductor_annotations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:20.966276]
2025-12-04T15:15:39.5085720Z 
2025-12-04T15:15:39.5086840Z inductor/test_inductor_annotations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inductor_annotations_1.1_e129b89bdd73962f_.log
2025-12-04T15:15:39.5088833Z Running 2 items in this shard: test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_no_annotations, test/inductor/test_inductor_annotations.py::InductorAnnotationTestCase::test_training_annotation
2025-12-04T15:15:39.5089975Z 
2025-12-04T15:15:39.5090388Z Finished inductor/test_inductor_annotations 1/1 ... [2025-12-04 15:15:39.508348][21697.118257512], took 0.31min
2025-12-04T15:15:39.5389634Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.xml
2025-12-04T15:15:39.6196622Z Running inductor/test_compile_worker 1/1 ... [2025-12-04 15:15:39.619354][21697.229261204]
2025-12-04T15:15:39.6197237Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:15:39.6200316Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_compile_worker.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:15:39.619777]
2025-12-04T15:17:11.8158341Z 
2025-12-04T15:17:11.8159689Z inductor/test_compile_worker 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_compile_worker_1.1_00f9da717f84f877_.log
2025-12-04T15:17:11.8166704Z Running 16 items in this shard: test/inductor/test_compile_worker.py::TestCompileWorker::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorker::test_crash, test/inductor/test_compile_worker.py::TestCompileWorker::test_exception, test/inductor/test_compile_worker.py::TestCompileWorker::test_logging, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorker::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_basic_jobs, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_crash, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_exception, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_logging, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce, test/inductor/test_compile_worker.py::TestCompileWorkerWithTimer::test_quiesce_repeatedly, test/inductor/test_compile_worker.py::TestTimer::test_basics, test/inductor/test_compile_worker.py::TestTimer::test_never_fires, test/inductor/test_compile_worker.py::TestTimer::test_repeated_calls, test/inductor/test_compile_worker.py::TestTimer::test_spammy_calls
2025-12-04T15:17:11.8173089Z 
2025-12-04T15:17:11.8173558Z Finished inductor/test_compile_worker 1/1 ... [2025-12-04 15:17:11.815624][21789.425533212], took 1.54min
2025-12-04T15:17:11.8469400Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.xml
2025-12-04T15:17:11.9207547Z Running dynamo/test_einops 1/1 ... [2025-12-04 15:17:11.920423][21789.530329785]
2025-12-04T15:17:11.9208115Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:17:11.9211208Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_einops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:11.920856]
2025-12-04T15:17:16.7922053Z 
2025-12-04T15:17:16.7923048Z dynamo/test_einops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_einops_1.1_fa1def1006f21bae_.log
2025-12-04T15:17:16.7924924Z Running 3 items in this shard: test/dynamo/test_einops.py::TestEinops::test_functions_version_none, test/dynamo/test_einops.py::TestEinops::test_layers_version_none, test/dynamo/test_einops.py::TestEinops::test_no_recompile_on_lazy_state_version_none
2025-12-04T15:17:16.7926232Z 
2025-12-04T15:17:16.7926545Z Finished dynamo/test_einops 1/1 ... [2025-12-04 15:17:16.791967][21794.401876585], took 0.08min
2025-12-04T15:17:16.8229093Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.xml
2025-12-04T15:17:16.8606326Z Running inductor/test_external_callables 1/1 ... [2025-12-04 15:17:16.860327][21794.470233362]
2025-12-04T15:17:16.8606938Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:17:16.8610381Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_external_callables.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:16.860769]
2025-12-04T15:17:41.7124499Z 
2025-12-04T15:17:41.7125623Z inductor/test_external_callables 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_external_callables_1.1_532bdcfa274f54bc_.log
2025-12-04T15:17:41.7128798Z Running 3 items in this shard: test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cpu, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_cuda, test/inductor/test_external_callables.py::TestInductorExternalCallable::test_matmul_dup
2025-12-04T15:17:41.7130413Z 
2025-12-04T15:17:41.7130825Z Finished inductor/test_external_callables 1/1 ... [2025-12-04 15:17:41.712242][21819.322151085], took 0.41min
2025-12-04T15:17:41.7436169Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.xml
2025-12-04T15:17:41.8302778Z Running test_testing 1/1 ... [2025-12-04 15:17:41.829955][21819.439862858]
2025-12-04T15:17:41.8303297Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:17:41.8306228Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_testing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:17:41.830381]
2025-12-04T15:18:51.0526727Z 
2025-12-04T15:18:51.0528152Z test_testing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_testing_1.1_a28c99e40f247370_.log
2025-12-04T15:18:51.1800073Z Running 2074 items in this shard: test/test_testing.py::TestTestingCUDA::test_assertEqual_longMessage_cuda, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_bool, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_float64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int16, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int32, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int64, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_int8, test/test_testing.py::TestTestingCUDA::test_assertEqual_numpy_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_device_type_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_cuda_assert_should_stop_common_utils_test_suite_cuda, test/test_testing.py::TestTestingCUDA::test_get_supported_dtypes_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_bool, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_atol_rtol_greater_than_zero_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_bool_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_complex_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_equality_shortcut_cuda, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_float_cuda_float64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int16, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int32, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int64, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_int8, test/test_testing.py::TestTestingCUDA::test_isclose_integer_cuda_uint8, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex128, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_complex64, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float16, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float32, test/test_testing.py::TestTestingCUDA::test_isclose_nan_equality_shortcut_cuda_float64, test/test_testing.py::TestTestingCUDA::test_setup_and_teardown_run_for_device_specific_tests_cuda, test/test_testing.py::TestTestingCUDA::test_supported_dtypes_abs_cuda, test/test_testing.py::TestFrameworkUtils::test_filtering_env_var, test/test_testing.py::TestAssertClose::test_bool, test/test_testing.py::TestAssertClose::test_default_tolerance_selection_mismatching_dtypes, test/test_testing.py::TestAssertClose::test_docstring_examples, test/test_testing.py::TestAssertClose::test_matching, test/test_testing.py::TestAssertClose::test_matching_atol, test/test_testing.py::TestAssertClose::test_matching_conjugate_bit, test/test_testing.py::TestAssertClose::test_matching_nan, test/test_testing.py::TestAssertClose::test_matching_nan_with_equal_nan, test/test_testing.py::TestAssertClose::test_matching_rtol, test/test_testing.py::TestAssertClose::test_meta, test/test_testing.py::TestAssertClose::test_mismatching_dtype, test/test_testing.py::TestAssertClose::test_mismatching_dtype_no_check, test/test_testing.py::TestAssertClose::test_mismatching_layout, test/test_testing.py::TestAssertClose::test_mismatching_layout_no_check, test/test_testing.py::TestAssertClose::test_mismatching_shape, test/test_testing.py::TestAssertClose::test_mismatching_stride, test/test_testing.py::TestAssertClose::test_mismatching_stride_no_check, test/test_testing.py::TestAssertClose::test_mismatching_types, test/test_testing.py::TestAssertClose::test_mismatching_types_subclasses, test/test_testing.py::TestAssertClose::test_mismatching_types_type_equality, test/test_testing.py::TestAssertClose::test_mismatching_values, test/test_testing.py::TestAssertClose::test_mismatching_values_atol, test/test_testing.py::TestAssertClose::test_mismatching_values_rtol, test/test_testing.py::TestAssertClose::test_none, test/test_testing.py::TestAssertClose::test_none_mismatch, test/test_testing.py::TestAssertClose::test_numpy, test/test_testing.py::TestAssertClose::test_only_atol, test/test_testing.py::TestAssertClose::test_only_rtol, test/test_testing.py::TestAssertClose::test_scalar, test/test_testing.py::TestAssertClose::test_unexpected_error_compare, test/test_testing.py::TestAssertClose::test_unexpected_error_originate, test/test_testing.py::TestAssertClose::test_unknown_layout, test/test_testing.py::TestAssertClose::test_unknown_type, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_cuda, test/test_testing.py::TestAssertCloseMultiDeviceCUDA::test_mismatching_device_no_check_cuda, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_abs_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_atol, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_scalars, test/test_testing.py::TestAssertCloseErrorMessage::test_identifier_tensor_likes, test/test_testing.py::TestAssertCloseErrorMessage::test_mismatched_elements, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_callable, test/test_testing.py::TestAssertCloseErrorMessage::test_msg_str, test/test_testing.py::TestAssertCloseErrorMessage::test_not_close, test/test_testing.py::TestAssertCloseErrorMessage::test_not_equal, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff, test/test_testing.py::TestAssertCloseErrorMessage::test_rel_diff_scalar, test/test_testing.py::TestAssertCloseErrorMessage::test_rtol, test/test_testing.py::TestAssertCloseErrorMessage::test_small_float_dtype, test/test_testing.py::TestAssertCloseErrorMessage::test_zero_div_zero, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_keys, test/test_testing.py::TestAssertCloseContainer::test_mapping_mismatching_values_msg, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_len, test/test_testing.py::TestAssertCloseContainer::test_sequence_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_coalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_matching_uncoalesced, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_indices_msg, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_nnz, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_sparse_dims, test/test_testing.py::TestAssertCloseSparseCOO::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_matching, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseCSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_matching, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseCSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_matching, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_col_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_crow_indices_msg, test/test_testing.py::TestAssertCloseSparseBSR::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_matching, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_ccol_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_row_indices_msg, test/test_testing.py::TestAssertCloseSparseBSC::test_mismatching_values_msg, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_channel, test/test_testing.py::TestAssertCloseQuantized::test_matching_per_tensor, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_is_quantized, test/test_testing.py::TestAssertCloseQuantized::test_mismatching_qscheme, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_exclude_zero_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high0_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high1_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_ge_high_low_high2_value_types3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_boolean_integral2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_default_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_nan_low_high2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_outside_valid_range_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_low_high_smoke_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_memory_format_memory_format_and_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_memory_format_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_False_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape0_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape1_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape2_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape3_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape4_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape5_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_noncontiguous_noncontiguous_True_shape6_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_requires_grad_requires_grad_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape0_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape1_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape2_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape3_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape4_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape5_splat_shape_True_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_False_cuda_uint8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bfloat16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_bool, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex128, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_complex64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_float64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int16, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int32, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int64, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_int8, test/test_testing.py::TestMakeTensorCUDA::test_smoke_shape6_splat_shape_True_cuda_uint8, test/test_testing.py::TestTestParametrization::test_apply_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_compose_param_specific_decorators, test/test_testing.py::TestTestParametrization::test_default_names, test/test_testing.py::TestTestParametrization::test_modules_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_multiple_handling_of_same_param_error, test/test_testing.py::TestTestParametrization::test_name_fn, test/test_testing.py::TestTestParametrization::test_ops_decorator_misuse_error, test/test_testing.py::TestTestParametrization::test_reparametrize, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_1, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_2, test/test_testing.py::TestTestParametrization::test_subtest_expected_failure_x_3, test/test_testing.py::TestTestParametrization::test_subtest_names, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_1_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_2_y_6, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_4, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_5, test/test_testing.py::TestTestParametrization::test_two_things_subtest_expected_failure_x_3_y_6, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_name_non_primitive_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_default_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_invalid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_dtypes_composition_valid_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_list_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_empty_param_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_modules_decorator_applies_module_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_multiple_handling_of_same_param_error_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_name_fn_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_composition_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_ops_decorator_applies_op_and_param_specific_decorators_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_param_specific_decoration_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_1_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_2_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_expected_failure_x_3_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_subtest_names_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_1_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_2_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_4_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_5_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_two_things_subtest_expected_failure_x_3_y_6_cuda, test/test_testing.py::TestTestParametrizationDeviceTypeCUDA::test_unparametrized_names_cuda, test/test_testing.py::TestImports::test_circular_dependencies, test/test_testing.py::TestImports::test_lazy_imports_are_lazy, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_functorch, test/test_testing.py::TestImports::test_no_mutate_global_logging_on_import_path_torch, test/test_testing.py::TestImports::test_no_warning_on_import, test/test_testing.py::TestImports::test_not_import_sympy, test/test_testing.py::TestOpInfos::test_sample_input, test/test_testing.py::TestOpInfos::test_sample_input_metadata, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_T_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___radd___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rand___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rdiv___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmod___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rmul___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___ror___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rpow___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rsub___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators___rxor___cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators__chunk_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_amin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_aminmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_arange_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_as_strided_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_atan2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bernoulli_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_left_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_right_shift_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bitwise_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_bucketize_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cat_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cauchy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_max_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_clamp_min_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_complex_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_copysign_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_cov_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diag_embed_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_diff_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_floor_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_no_rounding_mode_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_div_trunc_rounding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_dstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_empty_permuted_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_eye_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_fftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_hfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ifftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_ihfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_irfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft2_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfft_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fft_rfftn_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fliplr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_flipud_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_float_power_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_floor_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmax_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmin_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_fmod_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gather_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gcd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ge_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_geometric_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gradient_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_gt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_heaviside_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_histogramdd_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_hypot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igamma_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_igammac_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_index_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_isclose_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_item_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_kthvalue_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lcm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ldexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_le_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_cross_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_diagonal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_linspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_log_normal_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logaddexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logcumsumexp_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_and_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_or_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logical_xor_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_logspace_tensor_overload_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_lt_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_fill_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_masked_select_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_max_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_maximum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mean_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_median_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_min_binary_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_minimum_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_movedim_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_mul_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_multinomial_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_narrow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_native_layer_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ne_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_neg_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nextafter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_adaptive_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_avg_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_conv3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_embedding_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_gelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_group_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hardtanh_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_huber_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_l1_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool1d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool2d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_max_pool3d_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multi_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_multilabel_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_prelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rms_norm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_rrelu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_softshrink_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_normal_in_place_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_ormqr_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_polar_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_pow_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_remainder_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_renorm_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_reshape_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_roll_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rot90_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_rsub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_add_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_scatter_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_bartlett_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_blackman_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_exponential_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_gaussian_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_cosine_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_general_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hamming_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_hann_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_kaiser_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_signal_windows_nuttall_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_h_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_hermite_polynomial_he_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_laguerre_polynomial_l_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_legendre_polynomial_p_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_xlog1py_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_special_zeta_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sub_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_sum_to_size_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_t_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_take_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_trace_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_tril_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_triu_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_true_divide_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_unbind_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_uniform_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vdot_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_as_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_copy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_view_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vsplit_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_vstack_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_where_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_error_generators_xlogy_cuda, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_reference_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_H_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_T_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___getitem___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___radd___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rand___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rdiv___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmatmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmod___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rmul___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___ror___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rpow___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rsub___cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators___rxor___cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__batch_norm_with_update_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__chunk_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_lengths_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__segment_reduce_offsets_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__softmax_backward_data_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__unsafe_masked_index_put_accumulate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators__upsample_bilinear2d_aa_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_abs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_acosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcdiv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addcmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmm_decomposed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addmv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_addr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_alias_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_all_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_allclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_aminmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_angle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_any_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_arange_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argsort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_argwhere_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_partial_views_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_as_strided_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_asinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_atleast_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_baddbmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bernoulli_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bfloat16_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bincount_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_and_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_left_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_not_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_or_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_right_shift_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bitwise_xor_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_block_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bool_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_shapes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_broadcast_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_bucketize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_byte_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cartesian_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cauchy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cdouble_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ceil_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cfloat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chalf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_char_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_inverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cholesky_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_max_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clamp_min_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_clone_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_column_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_combinations_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_conj_physical_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_constant_pad_nd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_contiguous_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_copysign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_corrcoef_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cos_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cosh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_count_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cov_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cummin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_deg2rad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diag_embed_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagflat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diagonal_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_diff_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_digamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_floor_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_div_trunc_rounding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_double_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_dstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_einsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_permuted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_equal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_erfinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expand_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_expm1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_eye_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_fftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_hfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ifftshift_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_ihfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_irfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fft_rfftn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flip_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fliplr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_flipud_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_float_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_floor_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_fmod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_frexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_full_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gather_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gcd_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ge_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geometric_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_geqrf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gradient_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_grid_sampler_3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_gt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_half_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hash_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_heaviside_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_histc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_hypot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_igammac_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_imag_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_index_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_inner_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_int_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isclose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isfinite_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isnan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isneginf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isposinf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_isreal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_istft_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_item_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_jiterator_unary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kron_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_kthvalue_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lcm_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ldexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_le_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lerp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lgamma_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cond_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_cross_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_det_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_diagonal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eig_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_householder_product_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_inv_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_power_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_multi_dot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_slogdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_ex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_svdvals_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorinv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vander_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vecdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linalg_vector_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_linspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log10_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log1p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logcumsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logdet_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_and_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_not_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_or_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logical_xor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logspace_tensor_overload_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_long_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_lu_unpack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mH_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mT_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_argmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumprod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_cumsum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_fill_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_log_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logaddexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_logsumexp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_masked_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matmul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_matrix_exp_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_maximum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_median_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_binary_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_minimum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mode_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_movedim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_msort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mul_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_multinomial_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mv_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nan_to_num_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanmedian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nanquantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nansum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_narrow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_dropout_backward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_native_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ne_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_empty_strided_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_full_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_new_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nextafter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_celu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_channel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_dropout_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_elu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_embedding_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_gelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_glu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_linear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mish_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_head_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_normalize_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pad_replicate_negative_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pdist_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_prelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu6_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_relu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rms_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_selu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_complex_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_silu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softplus_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_softsign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_threshold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_nonzero_static_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_fro_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_inf_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_norm_nuc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_in_place_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_normal_number_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ones_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ormqr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_outer_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pca_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_permute_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pinverse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polar_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_positive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_pow_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_put_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_qr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_quantile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rad2deg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rand_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randint_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_randn_like_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_ravel_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_real_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reciprocal_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_remainder_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_renorm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_repeat_interleave_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_reshape_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resize_as__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_conj_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_resolve_neg_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_roll_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rot90_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_rsub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scalar_tensor_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_add_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_searchsorted_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_select_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sgn_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_short_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sigmoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sign_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_blackman_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_exponential_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hamming_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_hann_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_signbit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sin_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sinh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_slice_scatter_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_softmax_with_dtype_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sort_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_mm_reduce_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_airy_ai_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_j1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_bessel_y1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_entr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_erfcx_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i0e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_i1e_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_log_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtr_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_ndtri_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_xlog1py_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_special_zeta_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_list_args_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_split_with_sizes_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sqrt_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_square_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_squeeze_multiple_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_std_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_stft_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sub_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_sum_to_size_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_svd_lowrank_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_t_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_along_dim_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_take_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tan_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tanh_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensor_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tensordot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tile_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_to_sparse_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_topk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch__scaled_mm_v2_cuda_float8_e4m3fn, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__flash_attention_forward_cuda_float16, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trace_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_transpose_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapezoid_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trapz_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triangular_solve_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_tril_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_triu_indices_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_true_divide_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_trunc_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unbind_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unflatten_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unfold_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_uniform_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_consecutive_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unique_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unravel_index_cuda_int64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_chunk_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsafe_split_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_unsqueeze_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_mean_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_var_unbiased_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vdot_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_complex_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_as_real_cuda_complex64, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_copy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_view_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vsplit_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_vstack_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_where_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_xlogy_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zero__cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_cuda_float32, test/test_testing.py::TestOpInfoSampleFunctionsCUDA::test_opinfo_sample_generators_zeros_like_cuda_float32
2025-12-04T15:18:51.2824883Z 
2025-12-04T15:18:51.2825199Z Finished test_testing 1/1 ... [2025-12-04 15:18:51.056592][21888.666494533], took 1.15min
2025-12-04T15:18:51.2826506Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.xml
2025-12-04T15:18:51.2827654Z Running dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 15:18:51.230380][21888.840286167]
2025-12-04T15:18:51.2828222Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:18:51.2829444Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_fx_passes_pre_grad.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:51.230832]
2025-12-04T15:18:59.7571596Z 
2025-12-04T15:18:59.7572978Z dynamo/test_fx_passes_pre_grad 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_7c7f9dd585a9f6c9_.log
2025-12-04T15:18:59.7574342Z Running 1 items in this shard: test/dynamo/test_fx_passes_pre_grad.py::FxPassesPreGradTests::test_pass_execution_and_save
2025-12-04T15:18:59.7574941Z 
2025-12-04T15:18:59.7575311Z Finished dynamo/test_fx_passes_pre_grad 1/1 ... [2025-12-04 15:18:59.756946][21897.366856169], took 0.14min
2025-12-04T15:18:59.7879994Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.xml
2025-12-04T15:18:59.8651003Z Running export/test_strict_export_v2 1/1 ... [2025-12-04 15:18:59.864753][21897.474659718]
2025-12-04T15:18:59.8651617Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:18:59.8654279Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_strict_export_v2.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:18:59.865154]
2025-12-04T15:21:08.5161783Z 
2025-12-04T15:21:08.5162852Z export/test_strict_export_v2 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_strict_export_v2_1.1_3c4ed2fe1af04b4b_.log
2025-12-04T15:21:08.5409617Z Running 440 items in this shard: test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_assume_static_by_default_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_constraints_error_not_in_range_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_constraints_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_inline_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_slice_maxsize_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_slice_unbacked_dim1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_export_strict_narrow_unbacked_expr_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_no_grad_param_inplace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestDynamismExpression::test_reshape_view_backed_size_oblivious_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test__scaled_dot_product_flash_attention_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_additional_inputs_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_allow_explicit_guards_as_runtime_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_annotate_on_assert_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_args_type_checked_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_aten_lift_fresh_copy_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_attention_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_attr_assignment_extra_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_constrain_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_constant_relation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_linear_relation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_automatic_dynamic_shapes_simple_equality_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_baddbmm_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_non_strict_fake_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_non_strict_real_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_bincount_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_buffer_util_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_constructor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_constructor_torch_ir_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_capture_subclass_wrong_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_ccode_python_mod_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cdist_forward_compute_mode_zero_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_check_specialized_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_checks_to_constrain_range_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cleanup_dynamic_markers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_colin_unbacked_backed_vr_sub_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_colon_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_compiling_state_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_access_identical_symint_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_branches_return_constant_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_branches_return_same_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_contains_unbacked_no_escape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_int_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_with_module_stack_export_with_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cond_with_module_stack_export_with_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_aliasing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_input_naming_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_no_user_inp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_output_dup_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_requires_grad_const_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_return_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_with_non_functional_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constant_tensor_with_non_functional_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_in_eager_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_with_constrain_value_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_constrain_size_with_various_cases_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_conv_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_crop_like_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_cse_for_symint_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_functionalize_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_functionalize_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_auto_warn_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_op_preserve_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_custom_tag_metadata_re_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_batch_norm_functional_predispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_item_in_prim_after_decomposition_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_decomp_item_in_prim_before_decomposition_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_default_decomposition_core_cia_ops_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_1_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_integer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_repeat_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_simplified_repeat_non_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_simplified_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_out_of_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_derived_dim_repeat_derived_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_nonstrict_with_stacktrace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_detect_leak_strict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_gpu_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_mutation_float_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_device_to_static_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_1_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_auto_and_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_divisibility_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_hint_range_violations_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dim_hint_ranges_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_disable_forced_specializations_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_disable_forced_specializations_ok_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_gather_into_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_gather_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_reduce_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_all_to_all_single_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_distributed_reduce_scatter_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dont_duck_size_for_auto_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_double_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_aliasing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_list_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_checks_mutation_with_nan_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_fake_kernel_inference_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_draft_export_infers_fake_kernel_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_duplicate_modules_with_non_persistent_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_lr_shift_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_bounds_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_builder_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_dataclass_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_inferred_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_generic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_user_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_serdes_various_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_spec_with_pytree_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_shapes_wrapped_with_shape_guards_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_dynamic_sym_round_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_ends_of_bounds_oblivious_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_enum_str_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_error_does_not_reference_eager_fallback_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_error_when_passing_mutating_primitive_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_exception_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_expand_copy_export_handles_implicit_true_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_api_with_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_as_backend_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_lifted_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_symbol_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_associative_scan_symbol_scandim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_subclass_pre_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_aten_to_unflatten_subclass_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_preserve_torch_fn_for_subgraphs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_symbool_pred_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cond_warns_constant_pred_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_decomp_table_basic_pop_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_decomp_table_container_methods_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_op_lib_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_triton_kernel_mutable_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_custom_triton_kernel_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_cyclic_reference_leak_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomp_torture_case_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomp_torture_case_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomps_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_decomps_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_dynamo_config_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_run_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_container_type_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_for_training_with_state_dict_hooks_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_default_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_keyword_only_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_pytree_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_keyword_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_keyword_pytree_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_func_with_var_postional_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_function_schema_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_graph_with_no_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_bug_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_dynamic_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_input_mutation_static_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_leak_compile_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_linear_preserve_dynamic_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_max_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_max_onnx_reported_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_mod_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_preserve_linear_at_aot_level_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_preserve_linear_but_not_custom_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_rnn_variants_with_warning_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_scan_pytree_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_script_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_statically_known_true_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_then_compile_tensor_ctor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_autocast_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_fake_tensor_inputs_on_cuda_devices_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_fake_tensor_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_inline_constraints_complex_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_inline_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_set_grad_enabled_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_export_with_wrong_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_external_call_non_strict_real_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fake_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fake_weights_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_filter_traceback_frames_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_flex_attention_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_float_conversion_from_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_float_conversion_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_fqn_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_from_node_metadata_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_full_on_scalar_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_function_holding_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_hints_wrapper_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_hoo_inline_users_issue_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_if_functional_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_if_post_autograd_op_preserved_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inductor_backend_inside_nonstrict_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_class_method_recursive_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_class_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_inline_script_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_int_shape_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_intermediate_shape_comp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_invalid_pytree_dynamo_graph_capture_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_is_exporting_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_is_nonzero_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_isnonzero_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_113041_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_157289_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_issue_161902_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_istft_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_invalid_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_linear_convd_for_training_ir_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_keep_composite_ops_linear_convd_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_kwarg_dynamic_shapes_diff_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_kwargs_reorder_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_layer_norm_unbacked_normalized_shape_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_layer_sharing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_lazy_module_kwargs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_linear_conv_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_malformed_fqn_from_source_name_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_map_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_map_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mask_nonzero_static_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_masked_select_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_math_pow_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mismatched_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_mixed_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_dict_key_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_input_subclasses_parameterization_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_list_slice_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_module_with_dict_container_inp_out_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_modules_access_for_deleted_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_more_multidimensional_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multidimensional_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multinomial_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_multiple_definitions_same_name_dim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_namedtuple_input_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_native_multi_attention_head_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_dynamic_shapes_spec_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_fake_tensor_leak_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_constant_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_init_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nested_module_with_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nn_module_stack_shared_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_check_is_size_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_suggested_fixes_for_data_dependent_errors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_3_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_no_tensor_computation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_with_container_type_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_arg_name_dynamic_shapes_api_with_kwarg_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_persistent_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_strict_dynamic_shapes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_non_strict_dynamic_shapes_suggested_fixes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_none_buffers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonstrict_retrace_preserves_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonzero_2_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_nonzero_dynamic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_not_registered_parameter_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_operator_aten_tensor_mode_variant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_output_node_name_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pad_sequence_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_param_util_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_partial_patched_forward_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_collisions_hoo_subgraphs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_collisions_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_order_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_naming_order_variadic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_placeholder_update_preserving_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_predispatch_cond_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_predispatch_grad_wrappers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_annotation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_module_call_signature_unflatten_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_requires_grad_placeholders_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_preserve_shape_dynamism_for_unused_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_profiling_code_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_python_asserts_with_sym_int_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pytree_register_data_class_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_pytree_register_nested_data_class_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_raise_user_error_when_guard_on_data_dependent_operation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_range_constraints_with_replacement_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_alias_dtype_mismatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_bool_cast_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_errors_on_aliasing_custom_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_for_max_op_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_real_tensor_size_mismatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_redundant_assert_max_upper_bound_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_redundant_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_refine_dynamic_shapes_from_suggested_fixes_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_register_constant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_repeat_interleave_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_replace_unbacked_with_very_large_upperbound_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_replaced_unbacked_bindings_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_reshape_view_helper_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_retracable_ep_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_retrace_pre_autograd_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decomposition_supports_user_input_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decompositions_keep_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_run_decompositions_keep_tensor_constant_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_for_prim_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_for_prm_str_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_runtime_assert_with_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sdpa_gqa_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sequential_slicing_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_example_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_as_side_effect_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_empty_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_set_grad_unflatten_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_setgrad_lifted_tensor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_shared_submodule_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_simple_export_for_training_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_simple_unbacked_view_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_size_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_slice_nn_module_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_solver_unsupported_sympy_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_specialize_derived_dim_roots_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_split_const_gm_with_lifted_constants_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_stack_trace_make_fx_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_stack_trace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_primitives_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_shape_attribute_assignment_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_state_tensors_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_static_dim_constraints_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_context_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_complicated_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_const_metadata_not_top_level_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_const_metadata_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclass_nested_attr_access_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclasses_parameterization_nested_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_subclasses_parameterization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggest_torch_checks_with_non_negative_check_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggest_torch_checks_with_regular_check_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_for_data_dependent_errors_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_for_data_dependent_errors_puzzlers_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_suggested_fixes_new_roots_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_float_operators_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_or_sym_and_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_sym_sqrt_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symbool_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symfloat_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_additional_inputs_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_basic_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_ranges_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_shapes_collection_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_input_specialization_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_item_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_output_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_symint_tensor_return_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tag_ac_export_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_attribute_zero_args_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_constant_aten_to_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tensor_constant_with_wrapped_method_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_multiple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_multiple_update_sub_later_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_to_module_with_mutated_buffer_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tolist_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_torch_check_eq_commutativity_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_torch_fn_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_trace_under_fake_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_train_eval_on_exported_preautograd_module_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_tril_dynamic_diagonal_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_triu_dynamic_diagonal_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_3d_matmul_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_bincount_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_bindings_for_divisible_u_symint_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_deferred_runtime_retrace_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_expand_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_infer_size_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_kth_value_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_linear_layer_norm_input_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_noncontig_lin_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_pad_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_scalar_constructor_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_slice_forward_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_slice_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_stack_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_to_cond_passthrough_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_to_cond_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unbacked_unsqueeze_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_asserts_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_buffer_update_child2parent_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_closure_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_isinstance_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_dispatch_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_preserve_signature_no_error_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_shared_submodule_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_multiple_graphs_state_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_no_unroll_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_placeholder_update_child2parent_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_placeholder_update_grandchild2cousin_swap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_5_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_6_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_buf_8_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_const_preserving_3_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_const_preserving_3_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_6_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_9_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_10_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_1_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_5_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_mutating_buf_preserving_7_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unflatten_random_dag_preserving_4_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unused_aliases_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_unused_constant_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_uplift_common_custom_meta_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_uplift_common_custom_meta_with_multiple_calls_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_use_embedding_twice_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_user_input_and_buffer_mutation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_custom_autograd_function_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_vmap_to_assert_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_where_decomp_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_assert_separation_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_index_assertions_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_simple_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_while_loop_tensor_constant_idx_strict_export_v2, test/export/test_strict_export_v2.py::StrictExportV2TestExport::test_wrapper_module_strict_export_v2
2025-12-04T15:21:08.5650822Z 
2025-12-04T15:21:08.5651234Z Finished export/test_strict_export_v2 1/1 ... [2025-12-04 15:21:08.517457][22026.127362825], took 2.14min
2025-12-04T15:21:08.5652555Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.xml
2025-12-04T15:21:08.6396238Z Running export/test_functionalized_assertions 1/1 ... [2025-12-04 15:21:08.639266][22026.249172503]
2025-12-04T15:21:08.6397203Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:08.6399665Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_functionalized_assertions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:08.639691]
2025-12-04T15:21:13.9116116Z 
2025-12-04T15:21:13.9117373Z export/test_functionalized_assertions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_functionalized_assertions_1.1_7d17ab73392af6b4_.log
2025-12-04T15:21:13.9119735Z Running 2 items in this shard: test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_assert_async_msg, test/export/test_functionalized_assertions.py::TestFuntionalAssertions::test_functional_sym_constrain_range
2025-12-04T15:21:13.9121180Z 
2025-12-04T15:21:13.9121597Z Finished export/test_functionalized_assertions 1/1 ... [2025-12-04 15:21:13.911393][22031.521302758], took 0.09min
2025-12-04T15:21:13.9429390Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.xml
2025-12-04T15:21:13.9718910Z Running inductor/test_selective_lowering 1/1 ... [2025-12-04 15:21:13.971641][22031.581548786]
2025-12-04T15:21:13.9719496Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:13.9723161Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_selective_lowering.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:13.972029]
2025-12-04T15:21:32.7157630Z 
2025-12-04T15:21:32.7158716Z inductor/test_selective_lowering 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_selective_lowering_1.1_e1c78d2a5185c394_.log
2025-12-04T15:21:32.7160643Z Running 2 items in this shard: test/inductor/test_selective_lowering.py::SelectiveLoweringTest::test_basic_selective_lowering, test/inductor/test_selective_lowering.py::SelectiveLoweringTest::test_no_fallback_when_unmarked
2025-12-04T15:21:32.7161798Z 
2025-12-04T15:21:32.7162244Z Finished inductor/test_selective_lowering 1/1 ... [2025-12-04 15:21:32.715524][22050.325432969], took 0.31min
2025-12-04T15:21:32.7470654Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.xml
2025-12-04T15:21:32.8210165Z Running dynamo/test_base_output 1/1 ... [2025-12-04 15:21:32.820702][22050.430609748]
2025-12-04T15:21:32.8210766Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:32.8213641Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_base_output.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:32.821117]
2025-12-04T15:21:38.1431060Z 
2025-12-04T15:21:38.1431978Z dynamo/test_base_output 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_base_output_1.1_c6d6552f20e02364_.log
2025-12-04T15:21:38.1434591Z Running 6 items in this shard: test/dynamo/test_base_output.py::TestBaseOutput::test_assign, test/dynamo/test_base_output.py::TestBaseOutput::test_create, test/dynamo/test_base_output.py::TestBaseOutput::test_getattr, test/dynamo/test_base_output.py::TestBaseOutput::test_getitem, test/dynamo/test_base_output.py::TestBaseOutput::test_index, test/dynamo/test_base_output.py::TestBaseOutput::test_tuple
2025-12-04T15:21:38.1436540Z 
2025-12-04T15:21:38.1436881Z Finished dynamo/test_base_output 1/1 ... [2025-12-04 15:21:38.142918][22055.752827855], took 0.09min
2025-12-04T15:21:38.1746563Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.xml
2025-12-04T15:21:38.2086013Z Running inductor/test_lookup_table 1/1 ... [2025-12-04 15:21:38.208356][22055.818263765]
2025-12-04T15:21:38.2086588Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:38.2090027Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_lookup_table.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:38.208744]
2025-12-04T15:21:47.8226363Z 
2025-12-04T15:21:47.8227542Z inductor/test_lookup_table 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_lookup_table_1.1_47a98ebb9baf620f_.log
2025-12-04T15:21:47.8228368Z 
2025-12-04T15:21:47.8228741Z Finished inductor/test_lookup_table 1/1 ... [2025-12-04 15:21:47.822403][22065.432312847], took 0.16min
2025-12-04T15:21:47.8548384Z Running export/test_serialize 1/1 ... [2025-12-04 15:21:47.854555][22065.464462681]
2025-12-04T15:21:47.8548955Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:21:47.8552036Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_serialize.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:21:47.854954]
2025-12-04T15:22:25.7739317Z 
2025-12-04T15:22:25.7740551Z export/test_serialize 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_serialize_1.1_aebb5c7eea9352a2_.log
2025-12-04T15:22:25.7787042Z Running 116 items in this shard: test/export/test_serialize.py::TestSerialize::test_1D_tensor_slicing, test/export/test_serialize.py::TestSerialize::test_2D_tensor_slicing, test/export/test_serialize.py::TestSerialize::test_canonicalize, test/export/test_serialize.py::TestSerialize::test_complex_constant, test/export/test_serialize.py::TestSerialize::test_empty_constant, test/export/test_serialize.py::TestSerialize::test_empty_state_dict, test/export/test_serialize.py::TestSerialize::test_export_example_inputs_preserved, test/export/test_serialize.py::TestSerialize::test_export_with_extension_op_serialization, test/export/test_serialize.py::TestSerialize::test_int_list, test/export/test_serialize.py::TestSerialize::test_kwargs_default, test/export/test_serialize.py::TestSerialize::test_metadata_parsing_with_layer_split, test/export/test_serialize.py::TestSerialize::test_metadata_run_decomp_serder, test/export/test_serialize.py::TestSerialize::test_multi_return_some_unused, test/export/test_serialize.py::TestSerialize::test_nested_layer_split, test/export/test_serialize.py::TestSerialize::test_non_float_weight, test/export/test_serialize.py::TestSerialize::test_nonfinite_inputs, test/export/test_serialize.py::TestSerialize::test_predispatch_export_with_autograd_op, test/export/test_serialize.py::TestSerialize::test_preserve_aliasing, test/export/test_serialize.py::TestSerialize::test_rational_ranges, test/export/test_serialize.py::TestSerialize::test_serialize_constant_outputs, test/export/test_serialize.py::TestSerialize::test_serialize_infinite_sym_int, test/export/test_serialize.py::TestSerialize::test_serialize_list_returns, test/export/test_serialize.py::TestSerialize::test_serialize_multiple_returns_from_node, test/export/test_serialize.py::TestSerialize::test_serialize_param_mutation, test/export/test_serialize.py::TestSerialize::test_serialize_sym_float, test/export/test_serialize.py::TestSerialize::test_serialize_sym_int, test/export/test_serialize.py::TestSerialize::test_storage_offset, test/export/test_serialize.py::TestSerialize::test_symint_list, test/export/test_serialize.py::TestSerialize::test_triton_hop, test/export/test_serialize.py::TestSerialize::test_weight_sharing_gpu, test/export/test_serialize.py::TestDeserialize::test_arg_from, test/export/test_serialize.py::TestDeserialize::test_auto_functionalize, test/export/test_serialize.py::TestDeserialize::test_basic, test/export/test_serialize.py::TestDeserialize::test_cond, test/export/test_serialize.py::TestDeserialize::test_constraints, test/export/test_serialize.py::TestDeserialize::test_custom_obj, test/export/test_serialize.py::TestDeserialize::test_custom_obj_list_out, test/export/test_serialize.py::TestDeserialize::test_custom_obj_tuple_out, test/export/test_serialize.py::TestDeserialize::test_device, test/export/test_serialize.py::TestDeserialize::test_dynamic, test/export/test_serialize.py::TestDeserialize::test_export_no_inputs, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_assume_constant_result, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_autograd_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_class_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_class_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_nested_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_branch_nonlocal_variables, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_closed_over_variable, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_operands, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_cond_predicate, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_constrain_as_size_example, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_constrain_as_value_example, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_decorator, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dictionary, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_assert, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_constructor, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_if_guard, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_map, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_slicing, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_dynamic_shape_view, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_fn_with_kwargs, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_list_contains, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_list_unpack, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_model_attr_mutation, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_nested_function, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_null_context_manager, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_optional_input, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_pytree_flatten, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_scalar_output, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_specialized_attribute, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_static_for_loop, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_static_if, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_tensor_setattr, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_type_reflection_method, test/export/test_serialize.py::TestDeserialize::test_exportdb_supported_case_user_input_mutation, test/export/test_serialize.py::TestDeserialize::test_forward_compatibility, test/export/test_serialize.py::TestDeserialize::test_get_attr, test/export/test_serialize.py::TestDeserialize::test_get_attr_list, test/export/test_serialize.py::TestDeserialize::test_hoo_symint_input, test/export/test_serialize.py::TestDeserialize::test_list_of_optional_tensors, test/export/test_serialize.py::TestDeserialize::test_map, test/export/test_serialize.py::TestDeserialize::test_module, test/export/test_serialize.py::TestDeserialize::test_module_meta, test/export/test_serialize.py::TestDeserialize::test_multi_return, test/export/test_serialize.py::TestDeserialize::test_multiple_getitem, test/export/test_serialize.py::TestDeserialize::test_none_input, test/export/test_serialize.py::TestDeserialize::test_optional_tuple, test/export/test_serialize.py::TestDeserialize::test_positional_argument_with_default_value, test/export/test_serialize.py::TestDeserialize::test_pytree_namedtuple, test/export/test_serialize.py::TestDeserialize::test_serialize_float8, test/export/test_serialize.py::TestDeserialize::test_shape, test/export/test_serialize.py::TestDeserialize::test_sym_bool, test/export/test_serialize.py::TestDeserialize::test_sym_bool_dynamic_shapes, test/export/test_serialize.py::TestDeserialize::test_sym_bool_torch_check_equal, test/export/test_serialize.py::TestDeserialize::test_sym_float, test/export/test_serialize.py::TestDeserialize::test_sym_int_torch_check_equal, test/export/test_serialize.py::TestDeserialize::test_sym_ite, test/export/test_serialize.py::TestDeserialize::test_tensor_tensor_list, test/export/test_serialize.py::TestDeserialize::test_unbacked_bindings_serialize, test/export/test_serialize.py::TestSchemaVersioning::test_error, test/export/test_serialize.py::TestSaveLoad::test_deserialize_torch_artifact_dict, test/export/test_serialize.py::TestSaveLoad::test_save_buffer, test/export/test_serialize.py::TestSaveLoad::test_save_constants, test/export/test_serialize.py::TestSaveLoad::test_save_extra, test/export/test_serialize.py::TestSaveLoad::test_save_file, test/export/test_serialize.py::TestSaveLoad::test_save_load_with_multiple_empty_tensors, test/export/test_serialize.py::TestSaveLoad::test_save_path, test/export/test_serialize.py::TestSaveLoad::test_version_error, test/export/test_serialize.py::TestSerializeCustomClass::test_backed_size_oblivious_serdes, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class_containing_fake_tensor, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_class_input_to_function, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_copy, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_decomp, test/export/test_serialize.py::TestSerializeCustomClass::test_custom_tag_metadata_serialization, test/export/test_serialize.py::TestSerializeCustomClass::test_unbacked_range_serdes
2025-12-04T15:22:25.7831962Z 
2025-12-04T15:22:25.7832296Z Finished export/test_serialize 1/1 ... [2025-12-04 15:22:25.773954][22103.383860351], took 0.63min
2025-12-04T15:22:25.8067009Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.xml
2025-12-04T15:22:25.8981494Z Running inductor/test_move_constructors_to_gpu 1/1 ... [2025-12-04 15:22:25.897782][22103.507689601]
2025-12-04T15:22:25.8982136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:22:25.8985202Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_move_constructors_to_gpu.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:25.898229]
2025-12-04T15:22:49.0979215Z 
2025-12-04T15:22:49.0980415Z inductor/test_move_constructors_to_gpu 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_move_constructors_to_gpu_1.1_3373ad77744fe6e4_.log
2025-12-04T15:22:49.0984899Z Running 7 items in this shard: test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_multi_gpu, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_multiple_constructors, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_no_gpu, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_non_convertable_op_failure, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_output_failure, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_sets_equiv, test/inductor/test_move_constructors_to_gpu.py::TestMoveConstructorsToGpu::test_simple
2025-12-04T15:22:49.0988477Z 
2025-12-04T15:22:49.0988889Z Finished inductor/test_move_constructors_to_gpu 1/1 ... [2025-12-04 15:22:49.097676][22126.707586095], took 0.39min
2025-12-04T15:22:49.1298668Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.xml
2025-12-04T15:22:49.2088539Z Running inductor/test_remote_cache 1/1 ... [2025-12-04 15:22:49.208540][22126.818447548]
2025-12-04T15:22:49.2089122Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:22:49.2092121Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_remote_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:49.208976]
2025-12-04T15:22:54.5310272Z 
2025-12-04T15:22:54.5311323Z inductor/test_remote_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_remote_cache_1.1_46ddba7c7bb0dd06_.log
2025-12-04T15:22:54.5313568Z Running 3 items in this shard: test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_logging, test/inductor/test_remote_cache.py::TestRemoteCache::test_failure_no_sample, test/inductor/test_remote_cache.py::TestRemoteCache::test_normal_logging
2025-12-04T15:22:54.5314851Z 
2025-12-04T15:22:54.5315211Z Finished inductor/test_remote_cache 1/1 ... [2025-12-04 15:22:54.530810][22132.140720712], took 0.09min
2025-12-04T15:22:54.5633429Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.xml
2025-12-04T15:22:54.5928361Z Running inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:22:54.592564][22132.202470891]
2025-12-04T15:22:54.5929027Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:22:54.5932154Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_coordinate_descent_tuner.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:22:54.592979]
2025-12-04T15:23:13.2400325Z 
2025-12-04T15:23:13.2401701Z inductor/test_coordinate_descent_tuner 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_coordinate_descent_tuner_1.1_ec23ddb0902f120e_.log
2025-12-04T15:23:13.2405287Z Running 5 items in this shard: test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_abs_function, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_get_neighbour_values, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_no_neighbors, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_persistent_reduction, test/inductor/test_coordinate_descent_tuner.py::TestCoordinateDescentTuner::test_value_too_large
2025-12-04T15:23:13.2407982Z 
2025-12-04T15:23:13.2408400Z Finished inductor/test_coordinate_descent_tuner 1/1 ... [2025-12-04 15:23:13.239819][22150.849729028], took 0.31min
2025-12-04T15:23:13.2729144Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.xml
2025-12-04T15:23:13.3537971Z Running inductor/test_inplace_padding 1/1 ... [2025-12-04 15:23:13.353513][22150.963420266]
2025-12-04T15:23:13.3538581Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:23:13.3542126Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_inplace_padding.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:13.353960]
2025-12-04T15:23:35.5519092Z 
2025-12-04T15:23:35.5520170Z inductor/test_inplace_padding 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_inplace_padding_1.1_79ffe73bfaa271da_.log
2025-12-04T15:23:35.5524979Z Running 9 items in this shard: test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_linear_and_cel_max_autotune, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_input, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_mutating_padding_output, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_non_zero_cpp_wrapper, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_pad_too_large, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_due_to_fusion, test/inductor/test_inplace_padding.py::InplacePaddingTest::test_skip_pad_input
2025-12-04T15:23:35.5528862Z 
2025-12-04T15:23:35.5529236Z Finished inductor/test_inplace_padding 1/1 ... [2025-12-04 15:23:35.551684][22173.161593597], took 0.37min
2025-12-04T15:23:35.5841645Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.xml
2025-12-04T15:23:35.6625837Z Running inductor/test_cudacodecache 1/1 ... [2025-12-04 15:23:35.662280][22173.2721883]
2025-12-04T15:23:35.6626408Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:23:35.6629284Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cudacodecache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:35.662683]
2025-12-04T15:23:47.7451512Z 
2025-12-04T15:23:47.7452563Z inductor/test_cudacodecache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cudacodecache_1.1_0486dc99f2c38224_.log
2025-12-04T15:23:47.7454616Z Running 3 items in this shard: test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_async_compile, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_compilation_error, test/inductor/test_cudacodecache.py::TestCUDACodeCache::test_cuda_load
2025-12-04T15:23:47.7455940Z 
2025-12-04T15:23:47.7456321Z Finished inductor/test_cudacodecache 1/1 ... [2025-12-04 15:23:47.744930][22185.354839995], took 0.20min
2025-12-04T15:23:47.7780244Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.xml
2025-12-04T15:23:47.8932973Z Running inductor/test_minifier_utils 1/1 ... [2025-12-04 15:23:47.892900][22185.502807389]
2025-12-04T15:23:47.8933624Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:23:47.8936339Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_minifier_utils.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:47.893354]
2025-12-04T15:23:55.9195003Z 
2025-12-04T15:23:55.9196246Z inductor/test_minifier_utils 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_minifier_utils_1.1_29e2300addd2b151_.log
2025-12-04T15:23:55.9198371Z Running 3 items in this shard: test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_convert_module_to_string, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_invalid_output, test/inductor/test_minifier_utils.py::MinifierUtilsTests::test_non_exportable
2025-12-04T15:23:55.9200024Z 
2025-12-04T15:23:55.9200394Z Finished inductor/test_minifier_utils 1/1 ... [2025-12-04 15:23:55.919240][22193.529149315], took 0.13min
2025-12-04T15:23:55.9525454Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.xml
2025-12-04T15:23:56.0513608Z Running inductor/test_debug_trace 1/1 ... [2025-12-04 15:23:56.051046][22193.660953429]
2025-12-04T15:23:56.0514195Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:23:56.0517053Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_debug_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:23:56.051471]
2025-12-04T15:24:19.6010245Z 
2025-12-04T15:24:19.6011259Z inductor/test_debug_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_debug_trace_1.1_9dbcd0e5470fca07_.log
2025-12-04T15:24:19.6013244Z Running 3 items in this shard: test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_multi_tempalte, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_printer_const, test/inductor/test_debug_trace.py::TestDebugTrace::test_debug_trace
2025-12-04T15:24:19.6014523Z 
2025-12-04T15:24:19.6014869Z Finished inductor/test_debug_trace 1/1 ... [2025-12-04 15:24:19.600795][22217.210704721], took 0.39min
2025-12-04T15:24:19.6336513Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.xml
2025-12-04T15:24:19.7202285Z Running inductor/test_foreach 1/1 ... [2025-12-04 15:24:19.719805][22217.329712191]
2025-12-04T15:24:19.7202924Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:24:19.7205489Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_foreach.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:24:19.720273]
2025-12-04T15:33:07.4387486Z 
2025-12-04T15:33:07.4391423Z inductor/test_foreach 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_foreach_1.1_72dc555a9d39f8a0_.log
2025-12-04T15:33:07.4632155Z Running 536 items in this shard: test/inductor/test_foreach.py::ForeachTests::test_2d_block_mixed_sizes_with_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_block_no_mixed_sizes_no_mask, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_elems_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_2d_blocking_partitioning_mixed_sizes_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_aliasing, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_broadcasting_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_cpu_cpp_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcdiv, test/inductor/test_foreach.py::ForeachTests::test_decomp__foreach_addcmul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_dynamic_shapes_fallback_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_cpp_wrapper_xpu, test/inductor/test_foreach.py::ForeachTests::test_enable_dynamic_shapes_python_wrapper, test/inductor/test_foreach.py::ForeachTests::test_foreach_cpp_wrapper_cuda, test/inductor/test_foreach.py::ForeachTests::test_foreach_cpp_wrapper_xpu, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_binary_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_backward_unary_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_foreach_map_input_mutation, test/inductor/test_foreach.py::ForeachTests::test_fuse_concat, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_fusion_duplicate_buffer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_kernel_split_arg_limit_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_multi_device, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_consumer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_non_foreach_producer_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_after__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_add_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_div_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_mul_, test/inductor/test_foreach.py::ForeachTests::test_reinplacing_mut_before__foreach_sub_, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_scheduler_fusion_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_single_list__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_single_list_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_single_scalar_tensor_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_rsqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sqrt, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_abs, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addcmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_neg, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_recipaddmul_op, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_reciprocal, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sign, test/inductor/test_foreach.py::ForeachTests::test_singleton_lists_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion__foreach_sub, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_add_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_addrecip_op, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_max, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_clamp_min, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_copy, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_div, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_maximum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_minimum, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_mul, test/inductor/test_foreach.py::ForeachTests::test_type_promotion_foreach_map_sub, test/inductor/test_foreach.py::ForeachTests::test_zero_elems
2025-12-04T15:33:07.4868714Z 
2025-12-04T15:33:07.4869085Z Finished inductor/test_foreach 1/1 ... [2025-12-04 15:33:07.439400][22745.049306979], took 8.80min
2025-12-04T15:33:07.4870358Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.xml
2025-12-04T15:33:08.8286564Z Uploading artifacts took 1.24 seconds
2025-12-04T15:33:08.8290675Z Running inductor/test_cache 1/1 ... [2025-12-04 15:33:08.828873][22746.438779917]
2025-12-04T15:33:08.8291342Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:33:08.8296034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_cache.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:33:08.829327]
2025-12-04T15:34:02.6906755Z 
2025-12-04T15:34:02.6907944Z inductor/test_cache 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_cache_1.1_b15a3258d122eb10_.log
2025-12-04T15:34:02.7292636Z Running 725 items in this shard: test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type0_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type1_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::CacheTest::test_combo_concurrent_cache_type2_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_get_concurrent_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type0_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type1_value_type5, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type0, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type1, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type2, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type3, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type4, test/inductor/test_cache.py::CacheTest::test_insert_concurrent_cache_type2_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type0_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type0_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type1_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type0_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type0_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type1_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type1_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type2_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type2_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type3_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type3_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type4_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type4_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type5_get_first_False, test/inductor/test_cache.py::AsyncCacheTest::test_combo_async_concurrent_async_cache_type1_key_type2_value_type5_get_first_True, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_get_async_concurrent_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type0_key_type2_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type0_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type1_value_type5, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type0, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type1, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type2, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type3, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type4, test/inductor/test_cache.py::AsyncCacheTest::test_insert_async_concurrent_async_cache_type1_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_bad_encoding_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_duplicated_entries_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type0_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type1_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type0_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type1_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type2_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type3_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type4_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_False_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_False_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_True_with_semicolon_suffix_False, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_key_type2_value_type5_with_whitespace_True_with_semicolon_suffix_True, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_missing_comma_separator_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_env_var_not_un_pickle_able_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_dict, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type0_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type1_value_type5, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type0, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type1, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type2, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type3, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type4, test/inductor/test_cache.py::OtherTest::test_in_memory_cache_from_file_path_not_un_pickle_able_key_type2_value_type5, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_fpath_from_key_un_pickle_able_on_disk_cache_type0, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_fpath_from_key_un_pickle_able_on_disk_cache_type1, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_version_bump_on_disk_cache_type0, test/inductor/test_cache.py::OtherTest::test_on_disk_cache_version_bump_on_disk_cache_type1
2025-12-04T15:34:02.7665781Z 
2025-12-04T15:34:02.7666347Z Finished inductor/test_cache 1/1 ... [2025-12-04 15:34:02.691898][22800.301804729], took 0.90min
2025-12-04T15:34:02.7667570Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.xml
2025-12-04T15:34:02.8240481Z Running dynamo/test_config 1/1 ... [2025-12-04 15:34:02.823730][22800.433637411]
2025-12-04T15:34:02.8241044Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:02.8244075Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:02.824166]
2025-12-04T15:34:11.1007783Z 
2025-12-04T15:34:11.1008769Z dynamo/test_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_config_1.1_34b955669d56d548_.log
2025-12-04T15:34:11.1011262Z Running 5 items in this shard: test/dynamo/test_config.py::ConfigTests::test_automatic_dynamic, test/dynamo/test_config.py::ConfigTests::test_config_compile_ignored, test/dynamo/test_config.py::ConfigTests::test_config_hash, test/dynamo/test_config.py::ConfigTests::test_no_assume_static_by_default, test/dynamo/test_config.py::ConfigTests::test_no_automatic_dynamic
2025-12-04T15:34:11.1013096Z 
2025-12-04T15:34:11.1013415Z Finished dynamo/test_config 1/1 ... [2025-12-04 15:34:11.100527][22808.710436756], took 0.14min
2025-12-04T15:34:11.1351655Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.xml
2025-12-04T15:34:11.2165097Z Running dynamo/test_metrics_context 1/1 ... [2025-12-04 15:34:11.216227][22808.82611982]
2025-12-04T15:34:11.2165685Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:11.2169271Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_metrics_context.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:11.216669]
2025-12-04T15:34:16.7894751Z 
2025-12-04T15:34:16.7895834Z dynamo/test_metrics_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_metrics_context_1.1_5c0162a494019d34_.log
2025-12-04T15:34:16.7900634Z Running 9 items in this shard: test/dynamo/test_metrics_context.py::TestMetricsContext::test_add_to_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_context_exists, test/dynamo/test_metrics_context.py::TestMetricsContext::test_nested_context, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_disallow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_set_key_value, test/dynamo/test_metrics_context.py::TestMetricsContext::test_top_n, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_allow_overwrite, test/dynamo/test_metrics_context.py::TestMetricsContext::test_update_disallow_overwrite
2025-12-04T15:34:16.7904548Z 
2025-12-04T15:34:16.7904921Z Finished dynamo/test_metrics_context 1/1 ... [2025-12-04 15:34:16.789239][22814.39914817], took 0.09min
2025-12-04T15:34:16.8240251Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.xml
2025-12-04T15:34:16.8590747Z Running export/test_package 1/1 ... [2025-12-04 15:34:16.858782][22814.468688592]
2025-12-04T15:34:16.8591317Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:16.8594671Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_package.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:16.859211]
2025-12-04T15:34:22.7326575Z 
2025-12-04T15:34:22.7327817Z export/test_package 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_package_1.1_c7910f2956ab0b71_.log
2025-12-04T15:34:22.7329809Z Running 4 items in this shard: test/export/test_package.py::TestPackage::test_basic, test/export/test_package.py::TestPackage::test_error, test/export/test_package.py::TestPackage::test_more_than_once, test/export/test_package.py::TestPackage::test_overloads
2025-12-04T15:34:22.7331124Z 
2025-12-04T15:34:22.7331475Z Finished export/test_package 1/1 ... [2025-12-04 15:34:22.732445][22820.342355037], took 0.10min
2025-12-04T15:34:22.7672833Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.xml
2025-12-04T15:34:22.8024808Z Running dynamo/test_nops 1/1 ... [2025-12-04 15:34:22.802192][22820.412101026]
2025-12-04T15:34:22.8025362Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:22.8028600Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_nops.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:22.802610]
2025-12-04T15:34:28.8762772Z 
2025-12-04T15:34:28.8763724Z dynamo/test_nops 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_nops_1.1_eec8955a89c0749e_.log
2025-12-04T15:34:28.8765452Z Running 4 items in this shard: test/dynamo/test_nops.py::NopTests::test1, test/dynamo/test_nops.py::NopTests::test2, test/dynamo/test_nops.py::NopTests::test3, test/dynamo/test_nops.py::NopTests::test_extended_args
2025-12-04T15:34:28.8766525Z 
2025-12-04T15:34:28.8766829Z Finished dynamo/test_nops 1/1 ... [2025-12-04 15:34:28.876065][22826.485973511], took 0.10min
2025-12-04T15:34:28.9108746Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.xml
2025-12-04T15:34:28.9957663Z Running inductor/test_graph_transform_observer 1/1 ... [2025-12-04 15:34:28.995416][22826.605324831]
2025-12-04T15:34:28.9958347Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:28.9961145Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_graph_transform_observer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:28.995855]
2025-12-04T15:34:39.1755987Z 
2025-12-04T15:34:39.1757449Z inductor/test_graph_transform_observer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_graph_transform_observer_1.1_2166094392cbcf10_.log
2025-12-04T15:34:39.1759098Z Running 1 items in this shard: test/inductor/test_graph_transform_observer.py::TestGraphTransformObserver::test_sdpa_rewriter
2025-12-04T15:34:39.1759754Z 
2025-12-04T15:34:39.1760200Z Finished inductor/test_graph_transform_observer 1/1 ... [2025-12-04 15:34:39.175384][22836.785293487], took 0.17min
2025-12-04T15:34:39.2104318Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.xml
2025-12-04T15:34:39.2819659Z Running export/test_db 1/1 ... [2025-12-04 15:34:39.281670][22836.891578869]
2025-12-04T15:34:39.2820217Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:39.2823274Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_db.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:39.282097]
2025-12-04T15:34:50.5124082Z 
2025-12-04T15:34:50.5124995Z export/test_db 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_db_1.1_e88cbc04d8a44796_.log
2025-12-04T15:34:50.5141126Z Running 36 items in this shard: test/export/test_db.py::ExampleTests::test_exportdb_not_supported_case_dynamic_shape_round, test/export/test_db.py::ExampleTests::test_exportdb_not_supported_case_unsupported_operator, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_assume_constant_result, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_autograd_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_class_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_class_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_nested_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_branch_nonlocal_variables, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_closed_over_variable, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_operands, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_cond_predicate, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_constrain_as_size_example, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_constrain_as_value_example, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_decorator, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dictionary, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_assert, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_constructor, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_if_guard, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_map, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_slicing, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_dynamic_shape_view, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_fn_with_kwargs, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_list_contains, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_list_unpack, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_model_attr_mutation, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_nested_function, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_null_context_manager, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_optional_input, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_pytree_flatten, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_scalar_output, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_specialized_attribute, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_static_for_loop, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_static_if, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_tensor_setattr, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_type_reflection_method, test/export/test_db.py::ExampleTests::test_exportdb_supported_case_user_input_mutation
2025-12-04T15:34:50.5156767Z 
2025-12-04T15:34:50.5157067Z Finished export/test_db 1/1 ... [2025-12-04 15:34:50.512240][22848.12214896], took 0.19min
2025-12-04T15:34:50.5474656Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.xml
2025-12-04T15:34:50.6351264Z Running dynamo/test_export_mutations 1/1 ... [2025-12-04 15:34:50.634812][22848.244719678]
2025-12-04T15:34:50.6351892Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:50.6354749Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_export_mutations.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:50.635238]
2025-12-04T15:34:58.5112200Z 
2025-12-04T15:34:58.5113601Z dynamo/test_export_mutations 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_export_mutations_1.1_68937c62c4814f0f_.log
2025-12-04T15:34:58.5117377Z Running 5 items in this shard: test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_1, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_2, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_3, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_negative_4, test/dynamo/test_export_mutations.py::MutationExportTests::test_module_attribute_mutation_violation_positive_1
2025-12-04T15:34:58.5120400Z 
2025-12-04T15:34:58.5120785Z Finished dynamo/test_export_mutations 1/1 ... [2025-12-04 15:34:58.510979][22856.12088544], took 0.13min
2025-12-04T15:34:58.5469743Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.xml
2025-12-04T15:34:58.6291565Z Running inductor/test_config 1/1 ... [2025-12-04 15:34:58.628822][22856.23872865]
2025-12-04T15:34:58.6292133Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:34:58.6295437Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_config.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:34:58.629282]
2025-12-04T15:35:17.6726751Z 
2025-12-04T15:35:17.6727912Z inductor/test_config 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_config_1.1_8da77f3c96eb0a54_.log
2025-12-04T15:35:17.6734194Z Running 14 items in this shard: test/inductor/test_config.py::TestInductorConfig::test_api_options, test/inductor/test_config.py::TestInductorConfig::test_codegen_skips_custom_passes, test/inductor/test_config.py::TestInductorConfig::test_compile_api, test/inductor/test_config.py::TestInductorConfig::test_compile_api_passes_config, test/inductor/test_config.py::TestInductorConfig::test_get_compiler_config, test/inductor/test_config.py::TestInductorConfig::test_hasattr, test/inductor/test_config.py::TestInductorConfig::test_invalid_backend, test/inductor/test_config.py::TestInductorConfig::test_invalid_names, test/inductor/test_config.py::TestInductorConfig::test_non_inductor_backend, test/inductor/test_config.py::TestInductorConfig::test_options_do_something, test/inductor/test_config.py::TestInductorConfig::test_patch, test/inductor/test_config.py::TestInductorConfig::test_save_load, test/inductor/test_config.py::TestInductorConfig::test_select_decomp_table_fallback_embedding_bag_byte_unpack, test/inductor/test_config.py::TestInductorConfig::test_set
2025-12-04T15:35:17.6740005Z 
2025-12-04T15:35:17.6740339Z Finished inductor/test_config 1/1 ... [2025-12-04 15:35:17.672437][22875.282345389], took 0.32min
2025-12-04T15:35:17.7088850Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.xml
2025-12-04T15:35:17.7941111Z Running inductor/test_dependencies 1/1 ... [2025-12-04 15:35:17.793793][22875.403699783]
2025-12-04T15:35:17.7941734Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:17.7944887Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_dependencies.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:17.794242]
2025-12-04T15:35:28.1240901Z 
2025-12-04T15:35:28.1242009Z inductor/test_dependencies 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_dependencies_1.1_a229a828add2b21e_.log
2025-12-04T15:35:28.1245454Z Running 5 items in this shard: test/inductor/test_dependencies.py::TestDependencies::test_bucketize_dependencies_no_sorter, test/inductor/test_dependencies.py::TestDependencies::test_bucketize_dependencies_sorter, test/inductor/test_dependencies.py::TestDependencies::test_get_offset, test/inductor/test_dependencies.py::TestDependencies::test_normalize_with_stride_order_equal, test/inductor/test_dependencies.py::TestDependencies::test_normalize_with_stride_order_unequal
2025-12-04T15:35:28.1248008Z 
2025-12-04T15:35:28.1248379Z Finished inductor/test_dependencies 1/1 ... [2025-12-04 15:35:28.123859][22885.733768269], took 0.17min
2025-12-04T15:35:28.1594430Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.xml
2025-12-04T15:35:28.2466550Z Running inductor/test_fuzzer 1/1 ... [2025-12-04 15:35:28.246328][22885.856235643]
2025-12-04T15:35:28.2467146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:28.2470172Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fuzzer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:28.246759]
2025-12-04T15:35:49.8441110Z 
2025-12-04T15:35:49.8442093Z inductor/test_fuzzer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fuzzer_1.1_7ef41a4207e7fec8_.log
2025-12-04T15:35:49.8447441Z Running 11 items in this shard: test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_bisector_boolean, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_bisector_exception, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_dynamo_bisect, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_bisect, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_cpu, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_inductor_gpu, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_config_fuzzer_n_tuple, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_fuzzer_inductor_calling_compile, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_fuzzer_running_test, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_sampling_method_random, test/inductor/test_fuzzer.py::TestConfigFuzzer::test_sampling_method_toggle
2025-12-04T15:35:49.8452014Z 
2025-12-04T15:35:49.8452365Z Finished inductor/test_fuzzer 1/1 ... [2025-12-04 15:35:49.843861][22907.453769513], took 0.36min
2025-12-04T15:35:49.8799911Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.xml
2025-12-04T15:35:49.9873148Z Running dynamo/test_global 1/1 ... [2025-12-04 15:35:49.986944][22907.59685115]
2025-12-04T15:35:49.9873776Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:35:49.9876486Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_global.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:35:49.987372]
2025-12-04T15:36:06.0268960Z 
2025-12-04T15:36:06.0270122Z dynamo/test_global 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_global_1.1_be67321ce36fdfe2_.log
2025-12-04T15:36:06.0275521Z Running 12 items in this shard: test/dynamo/test_global.py::TestGlobals::test_store_global_1, test/dynamo/test_global.py::TestGlobals::test_store_global_2, test/dynamo/test_global.py::TestGlobals::test_store_global_cross_file, test/dynamo/test_global.py::TestGlobals::test_store_global_crossfile_inline, test/dynamo/test_global.py::TestGlobals::test_store_global_dict, test/dynamo/test_global.py::TestGlobals::test_store_global_dict_2, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_1, test/dynamo/test_global.py::TestGlobals::test_store_global_inline_2, test/dynamo/test_global.py::TestGlobals::test_store_global_list, test/dynamo/test_global.py::TestGlobals::test_store_global_list_2, test/dynamo/test_global.py::TestGlobals::test_store_global_new, test/dynamo/test_global.py::TestGlobals::test_store_global_object
2025-12-04T15:36:06.0279614Z 
2025-12-04T15:36:06.0279932Z Finished dynamo/test_global 1/1 ... [2025-12-04 15:36:06.026671][22923.63658009], took 0.27min
2025-12-04T15:36:06.0625815Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.xml
2025-12-04T15:36:06.1382424Z Running inductor/test_control_flow 1/4 ... [2025-12-04 15:36:06.137933][22923.7478397]
2025-12-04T15:36:06.1383011Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:36:06.1386588Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_control_flow.py', '--shard-id=1', '--num-shards=4', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:36:06.138390]
2025-12-04T15:51:12.2748291Z 
2025-12-04T15:51:12.2749419Z inductor/test_control_flow 1/4 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_control_flow_1.4_b6ec092c04daf6c8_.log
2025-12-04T15:51:12.2983348Z Running 190 items in this shard: test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_mismatched_branch_output_size_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_multiple_outputs_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_nested_control_flow_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_non_tensor_predicates_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_outer_code_before_after_device_cuda_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_reintepret_view_inputs_outputs, test/inductor/test_control_flow.py::CondTests::test_cond_select_with_input_idx_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_simple_control_flow_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_simple_with_int_closure_device_cpu, test/inductor/test_control_flow.py::CondTests::test_cond_subgraphs_with_parameters_device_cpu_dynamic_True, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cpu_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_closure_device_cuda_dynamic_False, test/inductor/test_control_flow.py::CondTests::test_cond_unbacked_symint_inner_device_cuda, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_infinite_loop_error, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_simple_control_flow_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_stack_output_simple_device_cuda_dynamic_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_conv_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_in_out_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_data_dependent_ops_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_buffers_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_outer_code_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_parameters_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_sym_expr_cond_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_False_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_with_unbacked_symint_closure_device_cpu_dynamic_True_autograd_True, test/inductor/test_control_flow.py::WhileLoopTests::test_while_loop_zero_loop_device_cpu_dynamic_True, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_generic_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_cpu, test/inductor/test_control_flow.py::AssociativeScanTests::test_associative_scan_CUDA_flip_combine_mode_pointwise_backend_inductor_device_cuda, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_cond_in_scan_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_chunked_ce_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cpu_dynamic_True, test/inductor/test_control_flow.py::ScanTests::test_scan_compare_chunked_ce_with_no_scan_device_cuda_dynamic_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cpu_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_conv_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_False_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cpu_dynamic_True_reverse_True_dim_3_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_False_dim_3_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_0_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_False_reverse_True_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_1_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_False_dim_3_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_0_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_False_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_1_pred_True_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_in_cond_device_cuda_dynamic_True_reverse_True_dim_3_pred_False_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_False_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_1_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_False_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cpu_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_False_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_0_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_1_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_False_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_0_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_1_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_nn_modules_device_cuda_dynamic_True_reverse_True_dim_3_scan_length_5_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cpu_dynamic_True_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_1_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_False_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_False_reverse_True_dim_2_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_False_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_0_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_1_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_pytree_in_out_device_cuda_dynamic_True_reverse_True_dim_2_autograd_True, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_False_autograd_False, test/inductor/test_control_flow.py::ScanTests::test_scan_with_clamp_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_False_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_nested_with_cond_device_cuda_dynamic_True_autograd_True, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cpu_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_False, test/inductor/test_control_flow.py::MapTests::test_map_simple_device_cuda_dynamic_True_autograd_True
2025-12-04T15:51:12.3189380Z 
2025-12-04T15:51:12.3197732Z Finished inductor/test_control_flow 1/4 ... [2025-12-04 15:51:12.319558][23829.929457464], took 15.10min
2025-12-04T15:51:12.3560227Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.xml
2025-12-04T15:51:12.4354980Z Running dynamo/test_cudagraphs 1/1 ... [2025-12-04 15:51:12.435170][23830.045077955]
2025-12-04T15:51:12.4355759Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:51:12.4358392Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_cudagraphs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:12.435574]
2025-12-04T15:51:21.8133662Z 
2025-12-04T15:51:21.8134698Z dynamo/test_cudagraphs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_cudagraphs_1.1_f31f593cd6865772_.log
2025-12-04T15:51:21.8138273Z Running 8 items in this shard: test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_basic, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dead_fill, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_dtoh, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_factory, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_htod, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_constant, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutate_input, test/dynamo/test_cudagraphs.py::TestAotCudagraphs::test_mutated_metadata
2025-12-04T15:51:21.8141180Z 
2025-12-04T15:51:21.8141546Z Finished dynamo/test_cudagraphs 1/1 ... [2025-12-04 15:51:21.813160][23839.423070409], took 0.16min
2025-12-04T15:51:21.8488547Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.xml
2025-12-04T15:51:21.9271763Z Running inductor/test_alignment 1/1 ... [2025-12-04 15:51:21.926893][23839.536800468]
2025-12-04T15:51:21.9272351Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:51:21.9275632Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_alignment.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:21.927318]
2025-12-04T15:51:42.9719475Z 
2025-12-04T15:51:42.9720539Z inductor/test_alignment 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_alignment_1.1_c850ab1c90ef7284_.log
2025-12-04T15:51:43.0089909Z Running 12 items in this shard: test/inductor/test_alignment.py::GPUTests::test_Q4_K_dequantization_cuda, test/inductor/test_alignment.py::GPUTests::test_alignment_without_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_incorrect_meta_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_no_align_for_custom_op_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1024_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_1048576_cuda, test/inductor/test_alignment.py::GPUTests::test_slice_view_dtype_size_128_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_2d_cuda, test/inductor/test_alignment.py::GPUTests::test_unaligned_input_cuda, test/inductor/test_alignment.py::GPUTests::test_view_dtype_slice_cuda
2025-12-04T15:51:43.0094549Z 
2025-12-04T15:51:43.0094913Z Finished inductor/test_alignment 1/1 ... [2025-12-04 15:51:42.971741][23860.581648143], took 0.35min
2025-12-04T15:51:43.0096198Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.xml
2025-12-04T15:51:43.0920557Z Running dynamo/test_profiler 1/1 ... [2025-12-04 15:51:43.091707][23860.701615104]
2025-12-04T15:51:43.0921156Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:51:43.0924265Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_profiler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:51:43.092149]
2025-12-04T15:52:01.6344248Z 
2025-12-04T15:52:01.6345563Z dynamo/test_profiler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_profiler_1.1_bdf79e2257b8f437_.log
2025-12-04T15:52:01.6351029Z Running 10 items in this shard: test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_backend_compile, test/dynamo/test_profiler.py::DynamoProfilerTests::test_dynamo_timed_profiling_isolated, test/dynamo/test_profiler.py::DynamoProfilerTests::test_execution_trace_dynamic_shapes, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_list_compilation, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profile_dynamic_shapes_runtime, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_cache_lookup_profiler_step, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_dynamo_compiled_region, test/dynamo/test_profiler.py::DynamoProfilerTests::test_profiler_enabled_export
2025-12-04T15:52:01.6355814Z 
2025-12-04T15:52:01.6356143Z Finished dynamo/test_profiler 1/1 ... [2025-12-04 15:52:01.634230][23879.24413783], took 0.31min
2025-12-04T15:52:01.6705938Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.xml
2025-12-04T15:52:01.7499993Z Running dynamo/test_guard_serialization 1/1 ... [2025-12-04 15:52:01.749664][23879.35957141]
2025-12-04T15:52:01.7500606Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:52:01.7504182Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_guard_serialization.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:01.750122]
2025-12-04T15:52:25.9623654Z 
2025-12-04T15:52:25.9625277Z dynamo/test_guard_serialization 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_guard_serialization_1.1_ca95c718e2b65acd_.log
2025-12-04T15:52:25.9654740Z Running 56 items in this shard: test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bool_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_method_patched_forward, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_empty, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_bound_methods_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_builtin_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_c10d_work, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_class_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_closure_var_missing, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_constant_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_ddp_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_default_device, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_deterministic_algorithms, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_contains, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_keys_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dict_version, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dispatch_key_set_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_dual_level, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_duplicate_input, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_empty_nn_module_hooks_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_equals_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_fsdp_training_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_locals, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_function_with_wrong_fqn, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_functorch_stack_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_global_state_guard_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_grad_mode_loading, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_guard_on_key_order_with_cache, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_hasattr_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_id_match_with_config, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_mapping_keys_check, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_nn_module, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_none_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_not_present_in_generic_dict, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_range_iterator_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sdp_backend_serialization, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_sequence_length, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_shape_env, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_skipped_objects, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tensor_subclass_metadata_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_torch_function_state_filter, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_tuple_iterator_len, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_type_match, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_sharded_tensor, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unserializable_submodule, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_process_group, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_stream, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_unused_weakref, test/dynamo/test_guard_serialization.py::TestGuardSerialization::test_weakref_alive, test/dynamo/test_guard_serialization.py::TestGuardSerializationFSDP::test_guard_serialization_fsdp_module
2025-12-04T15:52:25.9680835Z 
2025-12-04T15:52:25.9681237Z Finished dynamo/test_guard_serialization 1/1 ... [2025-12-04 15:52:25.962215][23903.572122893], took 0.40min
2025-12-04T15:52:25.9998800Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.xml
2025-12-04T15:52:26.0860948Z Running dynamo/test_dicts 1/1 ... [2025-12-04 15:52:26.085782][23903.695689497]
2025-12-04T15:52:26.0861493Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:52:26.0864668Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_dicts.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:26.086227]
2025-12-04T15:52:52.8890559Z 
2025-12-04T15:52:52.8891751Z dynamo/test_dicts 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_dicts_1.1_9286d343eb07609f_.log
2025-12-04T15:52:52.8939148Z Running 140 items in this shard: test/dynamo/test_dicts.py::DictTests::test_builtin_ior_, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_diff_keys, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_invalid_types, test/dynamo/test_dicts.py::DictTests::test_builtin_or_with_same_keys, test/dynamo/test_dicts.py::DictTests::test_construct_user_dict_and_return, test/dynamo/test_dicts.py::DictTests::test_contains_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_contains_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_custom_iter_dict, test/dynamo/test_dicts.py::DictTests::test_custom_keys_iter_dict, test/dynamo/test_dicts.py::DictTests::test_dict_construct_from_mapping_like, test/dynamo/test_dicts.py::DictTests::test_dict_construction_from_mapping_proxy, test/dynamo/test_dicts.py::DictTests::test_dict_contains, test/dynamo/test_dicts.py::DictTests::test_dict_contains_enum, test/dynamo/test_dicts.py::DictTests::test_dict_copy_alias, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order, test/dynamo/test_dicts.py::DictTests::test_dict_guard_on_keys_order2, test/dynamo/test_dicts.py::DictTests::test_dict_iter, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_and_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_or_, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_sub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_binop_op_xor, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_iand, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ior, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_isub, test/dynamo/test_dicts.py::DictTests::test_dict_keys_inplace_binop_op_ixor, test/dynamo/test_dicts.py::DictTests::test_dict_list_values, test/dynamo/test_dicts.py::DictTests::test_dict_mutation_side_effect, test/dynamo/test_dicts.py::DictTests::test_dict_namedtuple, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_modules, test/dynamo/test_dicts.py::DictTests::test_dict_order_keys_tensors, test/dynamo/test_dicts.py::DictTests::test_dict_reconstruct_keeps_original_order, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_contains, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_get_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_initialization_in_graph, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_instantiation_return, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_local_with_non_dict_method, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_mutation, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_methods_fallback_readonly, test/dynamo/test_dicts.py::DictTests::test_dict_subclass_setitem, test/dynamo/test_dicts.py::DictTests::test_dict_tag_guard, test/dynamo/test_dicts.py::DictTests::test_empty_dict_recompilation, test/dynamo/test_dicts.py::DictTests::test_fn_id, test/dynamo/test_dicts.py::DictTests::test_items_type, test/dynamo/test_dicts.py::DictTests::test_iter_default_dict, test/dynamo/test_dicts.py::DictTests::test_lazy_key_guarding, test/dynamo/test_dicts.py::DictTests::test_lazy_key_non_const_guarding, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_ban_muation_on_dict_realization, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_local_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_existing_mutation, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_local, test/dynamo/test_dicts.py::DictTests::test_mapping_proxy_for_nonlocal, test/dynamo/test_dicts.py::DictTests::test_move_to_end, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict_no_default_factory, test/dynamo/test_dicts.py::DictTests::test_newly_constructed_default_dict_with_dict, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_ordered_dict_subclass_reordered_keys, test/dynamo/test_dicts.py::DictTests::test_overridden_get_item, test/dynamo/test_dicts.py::DictTests::test_udf_dict_reconstruction, test/dynamo/test_dicts.py::DictTests::test_update_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_update_module_dunder_dict, test/dynamo/test_dicts.py::DictTests::test_weakref_dict, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_eq, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ior, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_ne, test/dynamo/test_dicts.py::DictGuardTests::test_cmp_or, test/dynamo/test_dicts.py::DictGuardTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictMethodsTests::test_clear, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictMethodsTests::test_copy, test/dynamo/test_dicts.py::DictMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::DictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::DictMethodsTests::test_get, test/dynamo/test_dicts.py::DictMethodsTests::test_items, test/dynamo/test_dicts.py::DictMethodsTests::test_keys, test/dynamo/test_dicts.py::DictMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::DictMethodsTests::test_pop, test/dynamo/test_dicts.py::DictMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictMethodsTests::test_type, test/dynamo/test_dicts.py::DictMethodsTests::test_update, test/dynamo/test_dicts.py::DictMethodsTests::test_values, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_binop_or, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_clear, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_copy, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_get, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_items, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_keys, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_pop, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_popitem, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_setdefault, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_type, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_update, test/dynamo/test_dicts.py::DictSubclassMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_iterable, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_ior_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_binop_or_return_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_clear, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_eq_order, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_cmp_ne, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_copy, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_dict___iter__, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_dict_type_comparison, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_fromkeys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_functools_partial_key, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_get, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_items, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_keys, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_move_to_end, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_namedtuple_functools, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_pop, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_popitem_kwarg, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_setdefault, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_type, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_update, test/dynamo/test_dicts.py::OrderedDictMethodsTests::test_values, test/dynamo/test_dicts.py::OrderedDictSubclassOverload::test_move_to_end
2025-12-04T15:52:52.8985551Z 
2025-12-04T15:52:52.8985876Z Finished dynamo/test_dicts 1/1 ... [2025-12-04 15:52:52.889062][23930.49896874], took 0.45min
2025-12-04T15:52:52.9345890Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.xml
2025-12-04T15:52:53.0377114Z Running dynamo/test_optimizers 1/1 ... [2025-12-04 15:52:53.037413][23930.647318861]
2025-12-04T15:52:53.0377709Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:52:53.0381687Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_optimizers.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:52:53.037846]
2025-12-04T15:53:02.0149657Z 
2025-12-04T15:53:02.0150720Z dynamo/test_optimizers 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_optimizers_1.1_6e8896f6f8ab34bf_.log
2025-12-04T15:53:02.0152673Z Running 3 items in this shard: test/dynamo/test_optimizers.py::End2EndTests::test_init_group, test/dynamo/test_optimizers.py::End2EndTests::test_optimizing_over_tensor_with_requires_grad, test/dynamo/test_optimizers.py::End2EndTests::test_state_dict
2025-12-04T15:53:02.0153981Z 
2025-12-04T15:53:02.0154335Z Finished dynamo/test_optimizers 1/1 ... [2025-12-04 15:53:02.014740][23939.624650069], took 0.15min
2025-12-04T15:53:02.0515936Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.xml
2025-12-04T15:53:02.1544465Z Running export/test_torchbind 1/1 ... [2025-12-04 15:53:02.154182][23939.76408984]
2025-12-04T15:53:02.1545032Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:02.1548248Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_torchbind.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:02.154584]
2025-12-04T15:53:37.1213321Z 
2025-12-04T15:53:37.1214370Z export/test_torchbind 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_torchbind_1.1_2a7aef954986f1ed_.log
2025-12-04T15:53:37.1263915Z Running 90 items in this shard: test/export/test_torchbind.py::TestExportTorchbind::test_aot_export_tensor_queue_operators, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_as_custom_op_argument_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_as_custom_op_argument_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_attribute_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_list_out_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_list_out_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_tuple_out_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_tuple_out_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_unbacked_symint_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_custom_obj_unbacked_symint_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_deepcopy, test/export/test_torchbind.py::TestExportTorchbind::test_export_inplace_custom_op, test/export/test_torchbind.py::TestExportTorchbind::test_identifying_torchbind_ops, test/export/test_torchbind.py::TestExportTorchbind::test_input_as_custom_op_argument_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_input_as_custom_op_argument_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_input_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_input_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_schema_checking_script_object, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_fakify_internal_states_make_fx_tracing_mode_fake, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_fakify_internal_states_make_fx_tracing_mode_symbolic, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_make_fx_tracing_mode_fake, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_methods_make_fx_tracing_mode_symbolic, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_operators_fallthrough_via_lib_impl, test/export/test_torchbind.py::TestExportTorchbind::test_make_fx_tensor_queue_operators_fallthrough_via_py_impl, test/export/test_torchbind.py::TestExportTorchbind::test_method_schema, test/export/test_torchbind.py::TestExportTorchbind::test_non_strict_export_methods, test/export/test_torchbind.py::TestExportTorchbind::test_none_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_none_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_safe_to_trace_with_real, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_alias_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_alias_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_input_and_alias_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_input_and_alias_pre_dispatch_True, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_op_fallthrough_keys_respects_lib_impl, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_op_register_fallthrough, test/export/test_torchbind.py::TestExportTorchbind::test_torchbind_register_attr_at_runtime_get_restored, test/export/test_torchbind.py::TestExportTorchbind::test_unlift_custom_obj_pre_dispatch_False, test/export/test_torchbind.py::TestExportTorchbind::test_unlift_custom_obj_pre_dispatch_True, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_body_aliasing_contents_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_input_aliasing_contents_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_non_fakified_method_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_missing_attr_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_missing_attr_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_setattr_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_error_on_script_obj_setattr_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_global_obj_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_as_hop_input_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_attributes_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_closure_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_graph_breaks, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cpu_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_obj_torchbind_op_with_autocast_device_cuda_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_automatic_dynamic_shape, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_script_object_input_guards_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_aot_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_eager, test/export/test_torchbind.py::TestCompileTorchbind::test_compile_tensor_op_in_tensor_flatten_backend_inductor, test/export/test_torchbind.py::TestCompileTorchbind::test_export_obj_torchbind_op_with_autocast_device_cpu, test/export/test_torchbind.py::TestCompileTorchbind::test_export_obj_torchbind_op_with_autocast_device_cuda, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_from_real_not_classmethod, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_no_from_real, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_no_torch_bind_class, test/export/test_torchbind.py::TestRegisterFakeClass::test_register_fake_class_valid
2025-12-04T15:53:37.1310693Z 
2025-12-04T15:53:37.1311042Z Finished export/test_torchbind 1/1 ... [2025-12-04 15:53:37.122513][23974.73241564], took 0.58min
2025-12-04T15:53:37.1602507Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.xml
2025-12-04T15:53:38.5908084Z Uploading artifacts took 1.35 seconds
2025-12-04T15:53:38.5912140Z Running dynamo/test_python_dispatcher 1/1 ... [2025-12-04 15:53:38.591029][23976.200936279]
2025-12-04T15:53:38.5913037Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:38.5916932Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_python_dispatcher.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:38.591466]
2025-12-04T15:53:46.7173032Z 
2025-12-04T15:53:46.7174077Z dynamo/test_python_dispatcher 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_python_dispatcher_1.1_d5e45034fa548233_.log
2025-12-04T15:53:46.7177634Z Running 6 items in this shard: test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key1, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key2, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key3, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key4, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_dispatch_key_set_guard, test/dynamo/test_python_dispatcher.py::PythonDispatcherTests::test_functorch_interpreter
2025-12-04T15:53:46.7180442Z 
2025-12-04T15:53:46.7180829Z Finished dynamo/test_python_dispatcher 1/1 ... [2025-12-04 15:53:46.717068][23984.326977713], took 0.14min
2025-12-04T15:53:46.7545625Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.xml
2025-12-04T15:53:46.8284532Z Running export/test_swap 1/1 ... [2025-12-04 15:53:46.828128][23984.438036132]
2025-12-04T15:53:46.8285109Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:46.8287884Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_swap.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:46.828549]
2025-12-04T15:53:57.1071579Z 
2025-12-04T15:53:57.1073084Z export/test_swap 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_swap_1.1_75b32b5d64f61c05_.log
2025-12-04T15:53:57.1080992Z Running 20 items in this shard: test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_args, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_nonstrict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_nonstrict::test_custom_output, test/export/test_swap.py::TestSwap_nonstrict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_nonstrict::test_nested_leaf, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_nonstrict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_nonstrict::test_unflatten_preserve_with_unused_input, test/export/test_swap.py::TestSwap_strict::test_custom_input_args, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs, test/export/test_swap.py::TestSwap_strict::test_custom_input_kwargs_use_private, test/export/test_swap.py::TestSwap_strict::test_custom_output, test/export/test_swap.py::TestSwap_strict::test_dedup_sym_size, test/export/test_swap.py::TestSwap_strict::test_nested_leaf, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_different_order, test/export/test_swap.py::TestSwap_strict::test_remove_duplicate_pytree_simple, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_signature, test/export/test_swap.py::TestSwap_strict::test_unflatten_preserve_with_unused_input
2025-12-04T15:53:57.1088938Z 
2025-12-04T15:53:57.1089248Z Finished export/test_swap 1/1 ... [2025-12-04 15:53:57.106959][23994.716868243], took 0.17min
2025-12-04T15:53:57.1436491Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.xml
2025-12-04T15:53:57.2320405Z Running export/test_unflatten 1/1 ... [2025-12-04 15:53:57.231723][23994.841631985]
2025-12-04T15:53:57.2320982Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:53:57.2323636Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_unflatten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:53:57.232109]
2025-12-04T15:54:21.6313429Z 
2025-12-04T15:54:21.6316182Z export/test_unflatten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_unflatten_1.1_e240ad71aaf7be43_.log
2025-12-04T15:54:21.6328507Z Running 29 items in this shard: test/export/test_unflatten.py::TestUnflatten::test_assert_tensor_metadata_stack, test/export/test_unflatten.py::TestUnflatten::test_attr_as_submod_input, test/export/test_unflatten.py::TestUnflatten::test_dedup_sym_size, test/export/test_unflatten.py::TestUnflatten::test_double_nested_submodule, test/export/test_unflatten.py::TestUnflatten::test_duplicate_placeholder, test/export/test_unflatten.py::TestUnflatten::test_fx_trace, test/export/test_unflatten.py::TestUnflatten::test_nested_leaf_non_strict, test/export/test_unflatten.py::TestUnflatten::test_placeholder_and_get_attr_ordering_after_unflattened, test/export/test_unflatten.py::TestUnflatten::test_simple_alias, test/export/test_unflatten.py::TestUnflatten::test_unflatten_buffer_mutation, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_obj, test/export/test_unflatten.py::TestUnflatten::test_unflatten_constant_tensor, test/export/test_unflatten.py::TestUnflatten::test_unflatten_container_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_eager, test/export/test_unflatten.py::TestUnflatten::test_unflatten_empty_branch, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested, test/export/test_unflatten.py::TestUnflatten::test_unflatten_nested_access, test/export/test_unflatten.py::TestUnflatten::test_unflatten_none, test/export/test_unflatten.py::TestUnflatten::test_unflatten_param_list_dict, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_signature, test/export/test_unflatten.py::TestUnflatten::test_unflatten_preserve_with_unused_input, test/export/test_unflatten.py::TestUnflatten::test_unflatten_requires_grad_param, test/export/test_unflatten.py::TestUnflatten::test_unflatten_root_module_type, test/export/test_unflatten.py::TestUnflatten::test_unflatten_shared_submodule, test/export/test_unflatten.py::TestUnflatten::test_unflatten_skipped_call_module, test/export/test_unflatten.py::TestUnflatten::test_unflatten_submodule_ordering, test/export/test_unflatten.py::TestUnflatten::test_unflatten_with_inplace_compile, test/export/test_unflatten.py::TestUnflatten::test_unflatten_wrong_input, test/export/test_unflatten.py::TestUnflatten::test_unflattened_module_nodes_has_meta_val
2025-12-04T15:54:21.6340093Z 
2025-12-04T15:54:21.6340435Z Finished export/test_unflatten 1/1 ... [2025-12-04 15:54:21.631104][24019.24101308], took 0.41min
2025-12-04T15:54:21.6685290Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.xml
2025-12-04T15:54:21.7344606Z Running dynamo/test_verify_correctness 1/1 ... [2025-12-04 15:54:21.734181][24019.344090123]
2025-12-04T15:54:21.7345248Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:54:21.7348233Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:21.734602]
2025-12-04T15:54:29.7104628Z 
2025-12-04T15:54:29.7105753Z dynamo/test_verify_correctness 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_verify_correctness_1.1_c32bdac20cc2dbcb_.log
2025-12-04T15:54:29.7108663Z Running 4 items in this shard: test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_example_inputs, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_false, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_incorrect_verify_true, test/dynamo/test_verify_correctness.py::TestVerifyCorrectness::test_torchscript
2025-12-04T15:54:29.7110605Z 
2025-12-04T15:54:29.7111149Z Finished dynamo/test_verify_correctness 1/1 ... [2025-12-04 15:54:29.710269][24027.320176707], took 0.13min
2025-12-04T15:54:29.7477087Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.xml
2025-12-04T15:54:29.8251946Z Running inductor/test_fxir_backend 1/1 ... [2025-12-04 15:54:29.824848][24027.434755369]
2025-12-04T15:54:29.8252884Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:54:29.8257231Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'inductor/test_fxir_backend.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:54:29.825338]
2025-12-04T15:55:34.3417254Z 
2025-12-04T15:55:34.3418285Z inductor/test_fxir_backend 1/1 was successful, full logs can be found in artifacts with path test/test-reports/inductor.test_fxir_backend_1.1_615cfb6d9761ce74_.log
2025-12-04T15:55:34.3447994Z Running 73 items in this shard: test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_False_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_autotune_enable_tuning_True_use_dynamic_shapes_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_backward, test/inductor/test_fxir_backend.py::FxirTestCase::test_basic, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_inputs, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_reinterpret_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_to_alloc, test/inductor/test_fxir_backend.py::FxirTestCase::test_cat_views, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_no_operands_pred_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_no_operands_pred_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_subgraph_pred_False, test/inductor/test_fxir_backend.py::FxirTestCase::test_cond_subgraph_pred_True, test/inductor/test_fxir_backend.py::FxirTestCase::test_cpp_raises, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_compiler, test/inductor/test_fxir_backend.py::FxirTestCase::test_custom_triton, test/inductor/test_fxir_backend.py::FxirTestCase::test_debug, test/inductor/test_fxir_backend.py::FxirTestCase::test_device_type, test/inductor/test_fxir_backend.py::FxirTestCase::test_duplicate_input, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_launch_grid_calc, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_and_strides, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_precomputed_size, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape0, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape1, test/inductor/test_fxir_backend.py::FxirTestCase::test_dynamic_shapes_with_padding_shape2, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1, test/inductor/test_fxir_backend.py::FxirTestCase::test_export_const_placeholder_const_1_5, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern, test/inductor/test_fxir_backend.py::FxirTestCase::test_extern_multi_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_fallback, test/inductor/test_fxir_backend.py::FxirTestCase::test_fallback_tuple_constant_arg, test/inductor/test_fxir_backend.py::FxirTestCase::test_free, test/inductor/test_fxir_backend.py::FxirTestCase::test_index_put_fallback, test/inductor/test_fxir_backend.py::FxirTestCase::test_multiple_kernels, test/inductor/test_fxir_backend.py::FxirTestCase::test_output_slice_view, test/inductor/test_fxir_backend.py::FxirTestCase::test_reshape_output, test/inductor/test_fxir_backend.py::FxirTestCase::test_scatter_fallback_scalar_src, test/inductor/test_fxir_backend.py::FxirTestCase::test_scatter_reduce_fallback, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_add, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_const, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_dynamic, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_aoti_fx_linear, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_dynamic_shape_pred_scalar_closure_length_4, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_dynamic_shape_pred_scalar_closure_length_8, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_multi_inputs_and_outputs_pred_False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_cond_multi_inputs_and_outputs_pred_True, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_const_folded_subgraph, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_backend, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_custom_triton_autotune_dynamic, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dims_dynamic_outer_static_padded_inner, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_input_expr_expr0, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_input_expr_expr1, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_dynamic_scalar_output, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__1_5, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__2, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_False_input__False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__1_5, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__2, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_item_dynamic_True_input__False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_mismatched_branch_dynamic_pred_False, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_mismatched_branch_dynamic_pred_True, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_reshape_dynamic_ph, test/inductor/test_fxir_backend.py::AOTFxirTestCase::test_reshape_dynamic_tmd, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_launch_grid_dynamic_padding, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_no_distribute_mul_floordiv, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_no_rewrite_div, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rational_multi_pows, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_mul_pow, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_mul_rational, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_nested, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_rewrite_floor_div_rational_const, test/inductor/test_fxir_backend.py::TestReplaceFloorDiv::test_variable_exp
2025-12-04T15:55:34.3476831Z 
2025-12-04T15:55:34.3477180Z Finished inductor/test_fxir_backend 1/1 ... [2025-12-04 15:55:34.341631][24091.951536658], took 1.08min
2025-12-04T15:55:34.3788241Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.xml
2025-12-04T15:55:34.4595613Z Running dynamo/test_structured_trace 1/1 ... [2025-12-04 15:55:34.459251][24092.069159625]
2025-12-04T15:55:34.4596192Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:55:34.4599299Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_structured_trace.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:55:34.459665]
2025-12-04T15:56:20.4921666Z 
2025-12-04T15:56:20.4922758Z dynamo/test_structured_trace 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_structured_trace_1.1_e2032e57f1fbb9a7_.log
2025-12-04T15:56:20.4936309Z Running 29 items in this shard: test/dynamo/test_structured_trace.py::StructuredTraceTest::test_chromium_event, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_collective_schedule_empty, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_collective_schedule_real, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compile_id_serialization_deserialization, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_attribution, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_chromium, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_compiled_autograd_id, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_cudagraphs, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_dump_file, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_dynamo_error, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_example_fn, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_example_training_fn, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_breaks, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_execution_order, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_graph_sizes_dynamic, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_guards_recompiles, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_inductor_error, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_make_fx_fail_partial, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompile_user_contexts, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompile_user_contexts_iteration, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_recompiles, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_runtime_estimates_mixed, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_runtime_estimates_simple, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_schedule, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging_dynamic_shapes, test/dynamo/test_structured_trace.py::StructuredTraceTest::test_tensor_metadata_logging_multiple_ops
2025-12-04T15:56:20.4949004Z 
2025-12-04T15:56:20.4949372Z Finished dynamo/test_structured_trace 1/1 ... [2025-12-04 15:56:20.492030][24138.101936537], took 0.77min
2025-12-04T15:56:20.5289231Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.xml
2025-12-04T15:56:20.6142067Z Running dynamo/test_torchrec 1/1 ... [2025-12-04 15:56:20.613965][24138.223875256]
2025-12-04T15:56:20.6142645Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:20.6145364Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_torchrec.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:20.614316]
2025-12-04T15:56:25.4530450Z 
2025-12-04T15:56:25.4531400Z dynamo/test_torchrec 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_torchrec_1.1_ef7e4418db36eb14_.log
2025-12-04T15:56:25.4532551Z Running 0 items in this shard:
2025-12-04T15:56:25.4532768Z 
2025-12-04T15:56:25.4533119Z Finished dynamo/test_torchrec 1/1 ... [2025-12-04 15:56:25.452872][24143.062780842], took 0.08min
2025-12-04T15:56:25.4894869Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.xml
2025-12-04T15:56:25.5187244Z Running test_model_exports_to_core_aten 1/1 ... [2025-12-04 15:56:25.518520][24143.128429544]
2025-12-04T15:56:25.5187831Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:25.5190885Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_model_exports_to_core_aten.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:25.518861]
2025-12-04T15:56:30.8404513Z 
2025-12-04T15:56:30.8405665Z test_model_exports_to_core_aten 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_model_exports_to_core_aten_1.1_1858ccc543938d86_.log
2025-12-04T15:56:30.8407033Z Running 1 items in this shard: test/test_model_exports_to_core_aten.py::TestQuantizePT2EModels::test_vit_aten_export
2025-12-04T15:56:30.8407642Z 
2025-12-04T15:56:30.8408019Z Finished test_model_exports_to_core_aten 1/1 ... [2025-12-04 15:56:30.840236][24148.450144237], took 0.09min
2025-12-04T15:56:30.8775567Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.xml
2025-12-04T15:56:30.9140070Z Running dynamo/test_precompile_context 1/1 ... [2025-12-04 15:56:30.913816][24148.523726995]
2025-12-04T15:56:30.9140647Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:30.9143833Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_precompile_context.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:30.914170]
2025-12-04T15:56:48.4528580Z 
2025-12-04T15:56:48.4529745Z dynamo/test_precompile_context 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_precompile_context_1.1_a5d2ca6b4ab870b9_.log
2025-12-04T15:56:48.4532094Z Running 3 items in this shard: test/dynamo/test_precompile_context.py::PrecompileContextTests::test_basic, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_editable, test/dynamo/test_precompile_context.py::PrecompileContextTests::test_serialize_by_key
2025-12-04T15:56:48.4533653Z 
2025-12-04T15:56:48.4534038Z Finished dynamo/test_precompile_context 1/1 ... [2025-12-04 15:56:48.452669][24166.062578006], took 0.29min
2025-12-04T15:56:48.4902818Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.xml
2025-12-04T15:56:48.5749676Z Running dynamo/test_trace_rules 1/1 ... [2025-12-04 15:56:48.574735][24166.184644581]
2025-12-04T15:56:48.5750236Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:48.5753475Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_trace_rules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:48.575112]
2025-12-04T15:56:57.0508977Z 
2025-12-04T15:56:57.0509982Z dynamo/test_trace_rules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_trace_rules_1.1_6759ebf57891eeeb_.log
2025-12-04T15:56:57.0513888Z Running 7 items in this shard: test/dynamo/test_trace_rules.py::TraceRuleTests::test_almost_impossible_missing_name, test/dynamo/test_trace_rules.py::TraceRuleTests::test_force_inline_custom_function, test/dynamo/test_trace_rules.py::TraceRuleTests::test_force_inline_torch_function, test/dynamo/test_trace_rules.py::TraceRuleTests::test_no_special_handlers_for_torch_non_c_bindings, test/dynamo/test_trace_rules.py::TraceRuleTests::test_skipfiles_inlinelist, test/dynamo/test_trace_rules.py::TraceRuleTests::test_torch_name_rule_map_updated, test/dynamo/test_trace_rules.py::TestModuleSurviveSkipFiles::test_module_survive_skip_files
2025-12-04T15:56:57.0516937Z 
2025-12-04T15:56:57.0517281Z Finished dynamo/test_trace_rules 1/1 ... [2025-12-04 15:56:57.050733][24174.660639587], took 0.14min
2025-12-04T15:56:57.0883588Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.xml
2025-12-04T15:56:57.1640941Z Running export/test_upgrader 1/1 ... [2025-12-04 15:56:57.163885][24174.773794717]
2025-12-04T15:56:57.1641467Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:56:57.1644786Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_upgrader.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:56:57.164255]
2025-12-04T15:57:02.3856006Z 
2025-12-04T15:57:02.3857025Z export/test_upgrader 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_upgrader_1.1_ed15a90621ede266_.log
2025-12-04T15:57:02.3860411Z Running 6 items in this shard: test/export/test_upgrader.py::TestUpgrader::test_field_renaming_chain_from_v0_complete, test/export/test_upgrader.py::TestUpgrader::test_field_renaming_chain_from_v0_missing_field, test/export/test_upgrader.py::TestUpgrader::test_field_renaming_from_v1_partial_chain, test/export/test_upgrader.py::TestUpgrader::test_nn_module_stack_error_handling_invalid_type, test/export/test_upgrader.py::TestUpgrader::test_nn_module_stack_transformation_from_v0, test/export/test_upgrader.py::TestUpgrader::test_nodes_without_metadata_handled_gracefully
2025-12-04T15:57:02.3863139Z 
2025-12-04T15:57:02.3863489Z Finished export/test_upgrader 1/1 ... [2025-12-04 15:57:02.385404][24179.99531236], took 0.09min
2025-12-04T15:57:02.4235190Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.xml
2025-12-04T15:57:02.4533057Z Running dynamo/test_hooks 1/1 ... [2025-12-04 15:57:02.453129][24180.063039746]
2025-12-04T15:57:02.4533762Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:02.4537238Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_hooks.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:02.453486]
2025-12-04T15:57:31.9089737Z 
2025-12-04T15:57:31.9090646Z dynamo/test_hooks 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_hooks_1.1_66426e5cf57243c0_.log
2025-12-04T15:57:31.9104356Z Running 34 items in this shard: test/dynamo/test_hooks.py::HooksTests::test_complex_state_mutation_in_intermediary_hooks_same_on_inductor, test/dynamo/test_hooks.py::HooksTests::test_complex_state_mutation_in_intermediary_hooks_same_on_inductor_with_graph_break, test/dynamo/test_hooks.py::HooksTests::test_functools_arg_vary, test/dynamo/test_hooks.py::HooksTests::test_global_module_forward_pre_hook, test/dynamo/test_hooks.py::HooksTests::test_hook_with_closure, test/dynamo/test_hooks.py::HooksTests::test_hook_with_nested_closure, test/dynamo/test_hooks.py::HooksTests::test_input_hooks_same, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks_same_on_aot_eager, test/dynamo/test_hooks.py::HooksTests::test_intermediary_hooks_same_on_inductor, test/dynamo/test_hooks.py::HooksTests::test_intermediate_hook_with_closure_aot, test/dynamo/test_hooks.py::HooksTests::test_intermediate_hook_with_closure_eager, test/dynamo/test_hooks.py::HooksTests::test_nnmodule_hook_guards, test/dynamo/test_hooks.py::HooksTests::test_no_recompile_on_hook_identity_change, test/dynamo/test_hooks.py::HooksTests::test_no_recompile_on_same_hook, test/dynamo/test_hooks.py::HooksTests::test_post_acc_grad_hook, test/dynamo/test_hooks.py::HooksTests::test_recompile, test/dynamo/test_hooks.py::HooksTests::test_register_hook_partial_guarding, test/dynamo/test_hooks.py::HooksTests::test_removed_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_only_register_hook_in_graph_local_inner, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_global_hook, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_global_hooks_handles_in_list, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_break_handle_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_break_handle_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_lambda, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_in_graph_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_multi_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_repeated_handle_not_local, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_hook_repeated_handle_return, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_multiple_hooks, test/dynamo/test_hooks.py::HooksTests::test_tensor_register_multiple_hooks_handles_in_list, test/dynamo/test_hooks.py::HooksTests::test_wrap_top_frame_with_hooks
2025-12-04T15:57:31.9117152Z 
2025-12-04T15:57:31.9117488Z Finished dynamo/test_hooks 1/1 ... [2025-12-04 15:57:31.908799][24209.518707425], took 0.49min
2025-12-04T15:57:31.9464405Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.xml
2025-12-04T15:57:32.0243188Z Running dynamo/test_generator 1/1 ... [2025-12-04 15:57:32.024084][24209.633993562]
2025-12-04T15:57:32.0243733Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:32.0247051Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'dynamo/test_generator.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:32.024460]
2025-12-04T15:57:42.3529843Z 
2025-12-04T15:57:42.3531059Z dynamo/test_generator 1/1 was successful, full logs can be found in artifacts with path test/test-reports/dynamo.test_generator_1.1_f207b5be74916c07_.log
2025-12-04T15:57:42.3562556Z Running 78 items in this shard: test/dynamo/test_generator.py::GeneratorTests::test_cleanup_throw, test/dynamo/test_generator.py::GeneratorTests::test_deque_extendleft, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container0, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container1, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container2, test/dynamo/test_generator.py::GeneratorTests::test_dict_tuple_list_generator_container3, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_generator, test/dynamo/test_generator.py::GeneratorTests::test_dynamo_disable_sub_generator, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains__, test/dynamo/test_generator.py::GeneratorTests::test_generator___contains___side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_2, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_3, test/dynamo/test_generator.py::GeneratorTests::test_generator_as_argument_4, test/dynamo/test_generator.py::GeneratorTests::test_generator_simple, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break, test/dynamo/test_generator.py::GeneratorTests::test_generator_with_side_effects_graph_break_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_and_reconstruct_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_before_calling_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_in_generator_while_reconstructing, test/dynamo/test_generator.py::GeneratorTests::test_graph_break_outside_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_infinite_generator_3, test/dynamo/test_generator.py::GeneratorTests::test_islice_chain, test/dynamo/test_generator.py::GeneratorTests::test_iter, test/dynamo/test_generator.py::GeneratorTests::test_list_extend, test/dynamo/test_generator.py::GeneratorTests::test_list_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_tensor_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_dict_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_local_var_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation, test/dynamo/test_generator.py::GeneratorTests::test_reconstruct_generator_with_object_mutation_before, test/dynamo/test_generator.py::GeneratorTests::test_return_advanced_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_exhaust_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_generator, test/dynamo/test_generator.py::GeneratorTests::test_return_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_return_tuple_generator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator, test/dynamo/test_generator.py::GeneratorTests::test_subgenerator_with_side_effects, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_generator_2, test/dynamo/test_generator.py::GeneratorTests::test_zip_infinite_generator, test/dynamo/test_generator.py::GeneratorTests::test_zip_subgenerator, test/dynamo/test_generator.py::TestGeneratorSend::test_send, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorSend::test_send_stop_iteration_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_close, test/dynamo/test_generator.py::TestGeneratorClose::test_close_after_exception, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_GeneratorExit_return, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_GeneratorExit, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc0, test/dynamo/test_generator.py::TestGeneratorClose::test_close_capture_and_reraise_exc_exc1, test/dynamo/test_generator.py::TestGeneratorClose::test_close_handling_finally, test/dynamo/test_generator.py::TestGeneratorClose::test_close_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_side_effects, test/dynamo/test_generator.py::TestGeneratorClose::test_close_with_subgen, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_False, test/dynamo/test_generator.py::TestGeneratorClose::test_next_after_close_fullgraph_True, test/dynamo/test_generator.py::TestGeneratorThrow::test_exception_context_with_yield, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_None_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_const_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_return_value_in_except_and_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_no_yield_after_throw, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_not_catch, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_raise_difference_exc, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_try_except_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_with_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_without_finally, test/dynamo/test_generator.py::TestGeneratorThrow::test_throw_yield_finally
2025-12-04T15:57:42.3593485Z 
2025-12-04T15:57:42.3593831Z Finished dynamo/test_generator 1/1 ... [2025-12-04 15:57:42.352850][24219.962757614], took 0.17min
2025-12-04T15:57:42.3910383Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.xml
2025-12-04T15:57:42.4707059Z Running export/test_verifier 1/1 ... [2025-12-04 15:57:42.470501][24220.080410947]
2025-12-04T15:57:42.4707657Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:42.4711446Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_verifier.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:42.470874]
2025-12-04T15:57:51.0466685Z 
2025-12-04T15:57:51.0467683Z export/test_verifier 1/1 was successful, full logs can be found in artifacts with path test/test-reports/export.test_verifier_1.1_96a0b4295b5beb1c_.log
2025-12-04T15:57:51.0472217Z Running 10 items in this shard: test/export/test_verifier.py::TestVerifier::test_ep_verifier_basic, test/export/test_verifier.py::TestVerifier::test_ep_verifier_buffer_mutate, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_buffer, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_output, test/export/test_verifier.py::TestVerifier::test_ep_verifier_invalid_param, test/export/test_verifier.py::TestVerifier::test_verifier_basic, test/export/test_verifier.py::TestVerifier::test_verifier_call_module, test/export/test_verifier.py::TestVerifier::test_verifier_higher_order, test/export/test_verifier.py::TestVerifier::test_verifier_nested_invalid_module, test/export/test_verifier.py::TestVerifier::test_verifier_no_functional
2025-12-04T15:57:51.0475996Z 
2025-12-04T15:57:51.0476321Z Finished export/test_verifier 1/1 ... [2025-12-04 15:57:51.046493][24228.656403766], took 0.14min
2025-12-04T15:57:51.0841990Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.xml
2025-12-04T15:57:51.1661602Z Running export/test_sparse 2/2 ... [2025-12-04 15:57:51.165927][24228.775837032]
2025-12-04T15:57:51.1662158Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T15:57:51.1665453Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'export/test_sparse.py', '--shard-id=2', '--num-shards=2', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 15:57:51.166297]
2025-12-04T16:03:05.8392931Z 
2025-12-04T16:03:05.8394053Z export/test_sparse 2/2 was successful, full logs can be found in artifacts with path test/test-reports/export.test_sparse_2.2_dc3ae5c04c4515a4_.log
2025-12-04T16:03:05.8434304Z Running 97 items in this shard: test/export/test_sparse.py::TestSparseProp::test_activation_coo, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_bfloat16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float32_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_float64_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_eltwisenet_int64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_bfloat16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float32_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_idnet_float64_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_idnet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_bfloat16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_float32_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_float64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_sumnet_int64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_bfloat16_int64_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float16_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float32_int64_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int32_SparseCSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int64_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_float64_int64_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseBSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseBSR, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseCOO, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int32_SparseCSC, test/export/test_sparse.py::TestSparseProp::test_todensenet_int64_int64_SparseBSC
2025-12-04T16:03:05.8472971Z 
2025-12-04T16:03:05.8473285Z Finished export/test_sparse 2/2 ... [2025-12-04 16:03:05.839176][24543.449082951], took 5.24min
2025-12-04T16:03:05.8776741Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.xml
2025-12-04T16:03:05.9560193Z Running functorch/test_ac 1/1 ... [2025-12-04 16:03:05.955803][24543.565712728]
2025-12-04T16:03:05.9561051Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:03:05.9564602Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/test_ac.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:05.956187]
2025-12-04T16:03:43.6204816Z 
2025-12-04T16:03:43.6205929Z functorch/test_ac 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.test_ac_1.1_99b1ba004ab023a0_.log
2025-12-04T16:03:43.6210453Z Running 9 items in this shard: test/functorch/test_ac.py::MemoryBudgetTest::test_attention_vs_linear, test/functorch/test_ac.py::MemoryBudgetTest::test_custom_triton_kernel, test/functorch/test_ac.py::MemoryBudgetTest::test_manual_ac, test/functorch/test_ac.py::MemoryBudgetTest::test_matmul_even_chain, test/functorch/test_ac.py::MemoryBudgetTest::test_matmul_uneven_chain, test/functorch/test_ac.py::MemoryBudgetTest::test_prioritize_cheaper_matmul, test/functorch/test_ac.py::MemoryBudgetTest::test_prioritize_cheaper_matmul2, test/functorch/test_ac.py::MemoryBudgetTest::test_profile, test/functorch/test_ac.py::MemoryBudgetTest::test_rematerializes_cheap
2025-12-04T16:03:43.6213661Z 
2025-12-04T16:03:43.6213980Z Finished functorch/test_ac 1/1 ... [2025-12-04 16:03:43.620288][24581.230197412], took 0.63min
2025-12-04T16:03:43.6588569Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.xml
2025-12-04T16:03:43.7314940Z Running test_out_dtype_op 1/1 ... [2025-12-04 16:03:43.731238][24581.341148437]
2025-12-04T16:03:43.7315560Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:03:43.7318715Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_out_dtype_op.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:43.731592]
2025-12-04T16:03:51.6062213Z 
2025-12-04T16:03:51.6063453Z test_out_dtype_op 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_out_dtype_op_1.1_3e48e335f34b8277_.log
2025-12-04T16:03:51.6068367Z Running 12 items in this shard: test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_dynamo, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_inductor_decomp_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_int_mm_default_trace, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_make_fx, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mm_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_mul_scalar_numerical, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_no_autograd, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_non_op_overload, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_op_functional, test/test_out_dtype_op.py::TestOutDtypeOp::test_out_dtype_wrong_output
2025-12-04T16:03:51.6072579Z 
2025-12-04T16:03:51.6072872Z Finished test_out_dtype_op 1/1 ... [2025-12-04 16:03:51.606030][24589.215940105], took 0.13min
2025-12-04T16:03:51.6445798Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.xml
2025-12-04T16:03:51.7280841Z Running torch_np/test_ufuncs_basic 1/1 ... [2025-12-04 16:03:51.727836][24589.337745972]
2025-12-04T16:03:51.7281513Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:03:51.7284900Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/test_ufuncs_basic.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:51.728192]
2025-12-04T16:03:57.7506331Z 
2025-12-04T16:03:57.7507817Z torch_np/test_ufuncs_basic 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.test_ufuncs_basic_1.1_5b79d2f51b6173f9_.log
2025-12-04T16:03:57.7701129Z Running 371 items in this shard: test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_equiv_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_no_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_safe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_same_kind_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_casting_casting_unsafe_ufunc0_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_dtype_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestUnaryUfuncs::test_x_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_vector_vs_scalar_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc0, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc1, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc10, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc11, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc12, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc13, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc14, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc15, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc16, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc2, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc3, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc4, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc5, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc6, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc7, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc8, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_broadcast_ufunc9, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_equiv_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_no_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_safe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_same_kind_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc0_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc10_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc11_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc12_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc13_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc14_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc15_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc16_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc1_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc2_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc3_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc4_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc5_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc6_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc7_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc8_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_complex128, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float32, test/torch_np/test_ufuncs_basic.py::TestBinaryUfuncs::test_xy_and_out_casting_casting_unsafe_ufunc9_out_dtype_float64, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_basic_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc0_op0_iop0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc1_op1_iop1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_bcast_ufunc2_op2_iop2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_array_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc0_op0_iop0_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc1_op1_iop1_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype0, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype1, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype2, test/torch_np/test_ufuncs_basic.py::TestNdarrayDunderVsUfunc::test_other_scalar_ufunc2_op2_iop2_other_dtype3, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype, test/torch_np/test_ufuncs_basic.py::TestUfuncDtypeKwd::test_binary_ufunc_dtype_and_out
2025-12-04T16:03:57.7889732Z 
2025-12-04T16:03:57.7890090Z Finished torch_np/test_ufuncs_basic 1/1 ... [2025-12-04 16:03:57.751104][24595.36101202], took 0.10min
2025-12-04T16:03:57.7894855Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.xml
2025-12-04T16:03:57.8678471Z Running lazy/test_step_closures 1/1 ... [2025-12-04 16:03:57.867594][24595.477504012]
2025-12-04T16:03:57.8679176Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:03:57.8682168Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'lazy/test_step_closures.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:03:57.867932]
2025-12-04T16:04:04.8414570Z 
2025-12-04T16:04:04.8415676Z lazy/test_step_closures 1/1 was successful, full logs can be found in artifacts with path test/test-reports/lazy.test_step_closures_1.1_f2cf8fda3341fdfb_.log
2025-12-04T16:04:04.8417859Z Running 4 items in this shard: test/lazy/test_step_closures.py::ClosuresTest::test_asynchronous, test/lazy/test_step_closures.py::ClosuresTest::test_asynchronous_exception, test/lazy/test_step_closures.py::ClosuresTest::test_synchronous, test/lazy/test_step_closures.py::ClosuresTest::test_synchronous_exception
2025-12-04T16:04:04.8419613Z 
2025-12-04T16:04:04.8419954Z Finished lazy/test_step_closures 1/1 ... [2025-12-04 16:04:04.841265][24602.451174716], took 0.12min
2025-12-04T16:04:04.8802437Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.xml
2025-12-04T16:04:04.9727604Z Running functorch/dim/test_getsetitem 1/1 ... [2025-12-04 16:04:04.972551][24602.582460879]
2025-12-04T16:04:04.9728208Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:04:04.9731474Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'functorch/dim/test_getsetitem.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:04:04.972920]
2025-12-04T16:04:09.9939602Z 
2025-12-04T16:04:09.9940690Z functorch/dim/test_getsetitem 1/1 was successful, full logs can be found in artifacts with path test/test-reports/functorch.dim.test_getsetitem_1.1_f956801402f0c75a_.log
2025-12-04T16:04:09.9949276Z Running 19 items in this shard: test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_basic_dim_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_boolean_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_complex_mixed_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_device_handling_cpu, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_dim_pack_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_dimlist_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_edge_cases, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_ellipsis_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_error_conditions, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_inferred_dimension_binding, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_mixed_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_multiple_dim_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_none_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_repeated_dim_usage, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_slice_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_stride_calculation, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_tensor_indexing, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_unbound_dim_binding, test/functorch/dim/test_getsetitem.py::TestGetSetItem::test_unbound_dimlist_indexing
2025-12-04T16:04:09.9957088Z 
2025-12-04T16:04:09.9957466Z Finished functorch/dim/test_getsetitem 1/1 ... [2025-12-04 16:04:09.993745][24607.603652977], took 0.08min
2025-12-04T16:04:10.0330097Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.xml
2025-12-04T16:04:10.1398914Z Running test_fx 1/1 ... [2025-12-04 16:04:10.139676][24607.749585416]
2025-12-04T16:04:10.1399391Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:04:10.1402989Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_fx.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:04:10.140055]
2025-12-04T16:08:24.4304319Z 
2025-12-04T16:08:24.4305154Z test_fx 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_fx_1.1_fe3aedf5a60597eb_.log
2025-12-04T16:08:24.4890025Z Running 1280 items in this shard: test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationInput_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationMetadata_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_MutationTorchTensorCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_Mutation_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_ReturnList_cuda, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cpu, test/test_fx.py::TestCommonPass::test_correctness_CSEPass_TakeList_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_FactoryFunctionCall_cuda, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cpu, test/test_fx.py::TestCommonPass::test_correctness_factory_CSEPass_MutationFactory_cuda, test/test_fx.py::TestCSEPass::test_banned_list, test/test_fx.py::TestCSEPass::test_empty, test/test_fx.py::TestCSEPass::test_immutable_list_multiple_entries, test/test_fx.py::TestCSEPass::test_immutable_list_type, test/test_fx.py::TestCSEPass::test_kwarg, test/test_fx.py::TestCSEPass::test_nested_immutable_list_type, test/test_fx.py::TestCSEPass::test_nochange, test/test_fx.py::TestCSEPass::test_rand_like, test/test_fx.py::TestCSEPass::test_rand_n, test/test_fx.py::TestCSEPass::test_random, test/test_fx.py::TestCSEPass::test_simple, test/test_fx.py::TestCSEPass::test_simple_2, test/test_fx.py::TestCSEPass::test_simple_multiple_same_ops, test/test_fx.py::TestCSEPass::test_two_args, test/test_fx.py::TestCSEPass::test_two_args_default, test/test_fx.py::TestDCE::test_dead_chain, test/test_fx.py::TestDCE::test_dead_getattr, test/test_fx.py::TestDCE::test_dead_placeholder, test/test_fx.py::TestDCE::test_dead_placeholder_with_user, test/test_fx.py::TestDCE::test_impure_custom, test/test_fx.py::TestDCE::test_impure_kwargs, test/test_fx.py::TestDCE::test_impure_nodes_args, test/test_fx.py::TestDCE::test_impure_random, test/test_fx.py::TestDCE::test_keep_collectives, test/test_fx.py::TestDCE::test_keep_collectives_no_overload, test/test_fx.py::TestDCE::test_keep_module_with_side_effects, test/test_fx.py::TestDCE::test_keep_setitem, test/test_fx.py::TestDCE::test_keep_torch_assert, test/test_fx.py::TestDCE::test_simple, test/test_fx.py::TestConstFold::test_check_inline_non_const, test/test_fx.py::TestConstFold::test_check_inline_non_const_mult_return, test/test_fx.py::TestConstFold::test_check_skip_folding_quant_dequant_pattern, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_one_attr_no_name_collision, test/test_fx.py::TestConstFold::test_const_fold_basic_placeholder_reordered, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr, test/test_fx.py::TestConstFold::test_const_fold_basic_two_attr_three_input, test/test_fx.py::TestConstFold::test_const_fold_has_inlined_call_module_node, test/test_fx.py::TestConstFold::test_const_fold_module_attr, test/test_fx.py::TestConstFold::test_const_fold_multi_const_folded_attrs, test/test_fx.py::TestConstFold::test_const_fold_noop, test/test_fx.py::TestConstFold::test_const_fold_partial_graph, test/test_fx.py::TestConstFold::test_const_fold_submod_hierarchy, test/test_fx.py::TestConstFold::test_const_fold_tensor_meta, test/test_fx.py::TestConstFold::test_const_fold_unused_placeholder, test/test_fx.py::TestConstFold::test_dict_output, test/test_fx.py::TestConstFold::test_do_not_fold_impure_subgraph, test/test_fx.py::TestConstFold::test_fold_module, test/test_fx.py::TestConstFold::test_fold_pure_subgraph, test/test_fx.py::TestConstFold::test_retain_node_meta, test/test_fx.py::TestConstFold::test_three_outputs, test/test_fx.py::TestConstFold::test_two_outputs, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_dim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_ndim_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_nelement_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_numel_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_shape_const, test/test_fx.py::TestConstParamShapeInControlFlow::test_param_size_const, test/test_fx.py::AnnotationsTest::test_annotate, test/test_fx.py::AnnotationsTest::test_annotations, test/test_fx.py::AnnotationsTest::test_broadcasting1, test/test_fx.py::AnnotationsTest::test_broadcasting2, test/test_fx.py::AnnotationsTest::test_broadcasting3, test/test_fx.py::AnnotationsTest::test_consistency, test/test_fx.py::AnnotationsTest::test_precision, test/test_fx.py::TypeCheckerTest::test_flatten_fully_static, test/test_fx.py::TypeCheckerTest::test_resnet50, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_symbolic_add_with_broadcast_2, test/test_fx.py::TypeCheckerTest::test_type_check_add_false, test/test_fx.py::TypeCheckerTest::test_type_check_add_true, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_add_with_scalar, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_broadcast, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_2D_false, test/test_fx.py::TypeCheckerTest::test_type_check_batch_norm_symbolic, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_2_fully_static, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_conv2D_types, test/test_fx.py::TypeCheckerTest::test_type_check_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_flatten3, test/test_fx.py::TypeCheckerTest::test_type_check_flatten_2, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_dyn_true_param_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_false, test/test_fx.py::TypeCheckerTest::test_type_check_reshape_true, test/test_fx.py::TypeCheckerTest::test_type_check_symbolic_inferenceconv2D_maxpool2d_flatten, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_False, test/test_fx.py::TypeCheckerTest::test_type_check_transpose_true, test/test_fx.py::TypeCheckerTest::test_type_maxpool2d_fully_static, test/test_fx.py::TypeCheckerTest::test_type_typechecl_maxpool2d_3dinput, test/test_fx.py::TypeCheckerTest::test_typecheck_basicblock, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_function, test/test_fx.py::TestMatcher::test_matcher_with_name_node_map_module, test/test_fx.py::TestMatcher::test_split_to_graph_and_name_node_map, test/test_fx.py::TestMatcher::test_subgraph_matcher_ignore_literals, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_attributes, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list, test/test_fx.py::TestMatcher::test_subgraph_matcher_with_list_bad, test/test_fx.py::TestMatcher::test_variatic_arg_matching, test/test_fx.py::TestPassManager::test_pass_manager, test/test_fx.py::TestPassManager::test_pass_manager_bad_checks, test/test_fx.py::TestPassManager::test_pass_manager_checks, test/test_fx.py::TestPassManager::test_pass_manager_error, test/test_fx.py::TestPassManager::test_this_before_that_pass_constraint, test/test_fx.py::TestPassManager::test_topological_sort, test/test_fx.py::TestSourceMatcher::test_legalize_slice, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_conv_relu_maxpool_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_conv_relu_conv_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_functional_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_linear_relu_linear_torch_fn_export_strict_True, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_False, test/test_fx.py::TestSourceMatcher::test_module_partitioner_weight_tied_strict_True, test/test_fx.py::TestSubgraphRewriter::test_matching_pattern_with_list_type_arg, test/test_fx.py::TestSubgraphRewriter::test_matching_variable_arguments, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_callback, test/test_fx.py::TestSubgraphRewriter::test_replace_pattern_with_filters, test/test_fx.py::TestSubgraphRewriter::test_replaced_nodes, test/test_fx.py::TestSubgraphRewriter::test_replacement_with_attrs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_annotations_int, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_call_method, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_correct_output_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_graph_argument_order, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_internal_pattern_nodes_cannot_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_local_revert, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_multiple_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_nodes_with_kwargs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_is_entire_graph, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_pattern_output_pattern_node_can_have_users_that_are_not_matched, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_placeholder_matching, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_preserves_logic, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_consecutive_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_duplicated_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replace_with_multiple_outputs, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_replaces_referenced_submodules, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_single_pattern_match, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_traced_as_callable, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_oneliner_pattern, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_overlapping_matches, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_trivial_replacement, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_args, test/test_fx.py::TestSubgraphRewriter::test_subgraph_rewriter_with_unused_results, test/test_fx.py::TestFX::test_all_input_nodes, test/test_fx.py::TestFX::test_annotation_with_future, test/test_fx.py::TestFX::test_annotations_empty_tuple, test/test_fx.py::TestFX::test_annotations_with_forward_references, test/test_fx.py::TestFX::test_annotations_with_no_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_internal_forward_references, test/test_fx.py::TestFX::test_annotations_with_non_torch_reference_and_no_internal_forward_references, test/test_fx.py::TestFX::test_args_kwargs, test/test_fx.py::TestFX::test_args_kwargs_no_self, test/test_fx.py::TestFX::test_ast_rewriter_reassigns_submodules, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert, test/test_fx.py::TestFX::test_ast_rewriter_rewrites_assert_with_message, test/test_fx.py::TestFX::test_ast_rewriter_wrap, test/test_fx.py::TestFX::test_ast_rewriter_wrap_fn_directly, test/test_fx.py::TestFX::test_ast_rewriter_wrap_with_submodule, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator, test/test_fx.py::TestFX::test_ast_rewriter_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_autowrap_functions, test/test_fx.py::TestFX::test_concrete_arg_none_assert, test/test_fx.py::TestFX::test_construct_root_dict, test/test_fx.py::TestFX::test_control_flow_tracing, test/test_fx.py::TestFX::test_copy_it, test/test_fx.py::TestFX::test_copy_no_remap, test/test_fx.py::TestFX::test_ctx_mgr, test/test_fx.py::TestFX::test_custom_codegen, test/test_fx.py::TestFX::test_custom_codegen_with_transformer, test/test_fx.py::TestFX::test_custom_import, test/test_fx.py::TestFX::test_custom_proxy_dynamic_value, test/test_fx.py::TestFX::test_custom_proxy_input_dependent_control_flow, test/test_fx.py::TestFX::test_custom_proxy_type, test/test_fx.py::TestFX::test_custom_proxy_type_literal, test/test_fx.py::TestFX::test_custom_traceback_not_raised_when_exception_source_is_submodule, test/test_fx.py::TestFX::test_custom_traceback_raised_when_exception_source_is_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graph_with_tracer_cls, test/test_fx.py::TestFX::test_deepcopy_graphmodule, test/test_fx.py::TestFX::test_deepcopy_graphmodule_with_transform, test/test_fx.py::TestFX::test_deepcopy_no_recursion, test/test_fx.py::TestFX::test_deepcopy_recursion_depth, test/test_fx.py::TestFX::test_deepcopy_tracer, test/test_fx.py::TestFX::test_deepcopy_with_submods_params, test/test_fx.py::TestFX::test_delete_unused_submodules_leaf, test/test_fx.py::TestFX::test_delete_unused_values, test/test_fx.py::TestFX::test_dict, test/test_fx.py::TestFX::test_direct_param_use, test/test_fx.py::TestFX::test_disallow_override, test/test_fx.py::TestFX::test_ellipsis, test/test_fx.py::TestFX::test_empty_graph_codegen, test/test_fx.py::TestFX::test_enum, test/test_fx.py::TestFX::test_erase_node_error, test/test_fx.py::TestFX::test_example_shape_prop, test/test_fx.py::TestFX::test_find_uses, test/test_fx.py::TestFX::test_fn_type_annotation_empty, test/test_fx.py::TestFX::test_fn_type_annotations, test/test_fx.py::TestFX::test_fx_and_or, test/test_fx.py::TestFX::test_fx_create_arg, test/test_fx.py::TestFX::test_fx_shifts, test/test_fx.py::TestFX::test_fx_stateless, test/test_fx.py::TestFX::test_get_torch_func_signature, test/test_fx.py::TestFX::test_getitem, test/test_fx.py::TestFX::test_getitem_subproc, test/test_fx.py::TestFX::test_graph_edit_with_proxy, test/test_fx.py::TestFX::test_graph_fns, test/test_fx.py::TestFX::test_graph_module, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_dict_init, test/test_fx.py::TestFX::test_graph_module_init_buffer_param_copied_mod_init, test/test_fx.py::TestFX::test_graph_module_replicate_for_dp, test/test_fx.py::TestFX::test_graph_unique_names, test/test_fx.py::TestFX::test_graph_unique_names_manual, test/test_fx.py::TestFX::test_immutable_dict_pytree_ops, test/test_fx.py::TestFX::test_immutable_list_pytree_ops, test/test_fx.py::TestFX::test_imul_code_print, test/test_fx.py::TestFX::test_inf_nan, test/test_fx.py::TestFX::test_inf_nan_kwds, test/test_fx.py::TestFX::test_informative_co_filename, test/test_fx.py::TestFX::test_inline_graph, test/test_fx.py::TestFX::test_insert_arg, test/test_fx.py::TestFX::test_insertion_point, test/test_fx.py::TestFX::test_interpreter, test/test_fx.py::TestFX::test_interpreter_boxed_run_argument_validation, test/test_fx.py::TestFX::test_interpreter_default_args, test/test_fx.py::TestFX::test_interpreter_gc_values, test/test_fx.py::TestFX::test_interpreter_noop_resnet18, test/test_fx.py::TestFX::test_interpreter_not_enough_args, test/test_fx.py::TestFX::test_interpreter_onthefly_swap, test/test_fx.py::TestFX::test_interpreter_other_graph, test/test_fx.py::TestFX::test_interpreter_partial_eval, test/test_fx.py::TestFX::test_interpreter_run_node_override, test/test_fx.py::TestFX::test_interpreter_star_args, test/test_fx.py::TestFX::test_interpreter_with_codegen, test/test_fx.py::TestFX::test_layout, test/test_fx.py::TestFX::test_leaf_module, test/test_fx.py::TestFX::test_lineno_map, test/test_fx.py::TestFX::test_matmul_tracing, test/test_fx.py::TestFX::test_metadata_on_ph, test/test_fx.py::TestFX::test_module_deepcopy_edit_nodes, test/test_fx.py::TestFX::test_move_before, test/test_fx.py::TestFX::test_multi_insert_point, test/test_fx.py::TestFX::test_multiple_default_args, test/test_fx.py::TestFX::test_named_tuple_inlined, test/test_fx.py::TestFX::test_namedtuple_return_qualname, test/test_fx.py::TestFX::test_namedtuple_return_trace, test/test_fx.py::TestFX::test_native_callable, test/test_fx.py::TestFX::test_nn_module_stack, test/test_fx.py::TestFX::test_no_mutation, test/test_fx.py::TestFX::test_node_tagging, test/test_fx.py::TestFX::test_nonetype_annotation, test/test_fx.py::TestFX::test_partial_trace, test/test_fx.py::TestFX::test_pickle_custom_import, test/test_fx.py::TestFX::test_pickle_graphmodule, test/test_fx.py::TestFX::test_pickle_nonetype_annotation, test/test_fx.py::TestFX::test_pickle_torch_custom_ops, test/test_fx.py::TestFX::test_prepend_does_not_leak, test/test_fx.py::TestFX::test_prepend_self, test/test_fx.py::TestFX::test_pretty_print, test/test_fx.py::TestFX::test_pretty_print_graph, test/test_fx.py::TestFX::test_pretty_print_node, test/test_fx.py::TestFX::test_pretty_print_targets, test/test_fx.py::TestFX::test_print_graph, test/test_fx.py::TestFX::test_profiler_multiple_modules, test/test_fx.py::TestFX::test_profiler_nested_graph_modules, test/test_fx.py::TestFX::test_profiler_ranges_side_effect, test/test_fx.py::TestFX::test_profiler_stack_trace_augmentation, test/test_fx.py::TestFX::test_proxy_deepcopy_with_tracer, test/test_fx.py::TestFX::test_proxy_deepcopy_without_tracer, test/test_fx.py::TestFX::test_pytree, test/test_fx.py::TestFX::test_pytree_concrete, test/test_fx.py::TestFX::test_reassign_args_kwargs_uses, test/test_fx.py::TestFX::test_regular_and_default_args, test/test_fx.py::TestFX::test_remove_uses, test/test_fx.py::TestFX::test_remove_uses_with_custom_filter, test/test_fx.py::TestFX::test_replace_input, test/test_fx.py::TestFX::test_replace_uses, test/test_fx.py::TestFX::test_reserved_getattr, test/test_fx.py::TestFX::test_return_tuple, test/test_fx.py::TestFX::test_return_type_exists, test/test_fx.py::TestFX::test_return_type_exists_pre_pep585, test/test_fx.py::TestFX::test_script_method_trace, test/test_fx.py::TestFX::test_script_tensor_constant, test/test_fx.py::TestFX::test_sequential, test/test_fx.py::TestFX::test_shape_prop_aggregate, test/test_fx.py::TestFX::test_shape_prop_layout, test/test_fx.py::TestFX::test_shape_prop_layout_3d, test/test_fx.py::TestFX::test_shape_prop_unbacked_sym, test/test_fx.py::TestFX::test_single_default_arg, test/test_fx.py::TestFX::test_snake_case, test/test_fx.py::TestFX::test_sqrt, test/test_fx.py::TestFX::test_stack_traces, test/test_fx.py::TestFX::test_stack_traces_with_transformer, test/test_fx.py::TestFX::test_string_literal_return, test/test_fx.py::TestFX::test_submodule_manipulation_API, test/test_fx.py::TestFX::test_symbolic_trace_assert, test/test_fx.py::TestFX::test_symbolic_trace_sequential, test/test_fx.py::TestFX::test_tensor_attribute, test/test_fx.py::TestFX::test_tensor_attribute_coalseced, test/test_fx.py::TestFX::test_tensor_constant, test/test_fx.py::TestFX::test_throw_out_variant, test/test_fx.py::TestFX::test_torch_custom_ops, test/test_fx.py::TestFX::test_torch_fx_getattr, test/test_fx.py::TestFX::test_torch_fx_len, test/test_fx.py::TestFX::test_torch_op_overloads, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx, test/test_fx.py::TestFX::test_torchbind_class_attribute_in_fx_tensor_arg, test/test_fx.py::TestFX::test_trace_buffer_slice, test/test_fx.py::TestFX::test_trace_dict_int_keys, test/test_fx.py::TestFX::test_trace_dict_proxy_keys, test/test_fx.py::TestFX::test_trace_fn_constant, test/test_fx.py::TestFX::test_trace_function, test/test_fx.py::TestFX::test_trace_multiple_funcs, test/test_fx.py::TestFX::test_trace_return_dataclass, test/test_fx.py::TestFX::test_trace_return_dataclass_nested, test/test_fx.py::TestFX::test_trace_return_namedtuple, test/test_fx.py::TestFX::test_tracing_graphmodules_as_leaf_submodules, test/test_fx.py::TestFX::test_transformer_multi_outputs, test/test_fx.py::TestFX::test_transformer_noop, test/test_fx.py::TestFX::test_transformer_op_swap, test/test_fx.py::TestFX::test_transformer_preserves_nn_module_stack_for_get_attr, test/test_fx.py::TestFX::test_tuple_no_subscript, test/test_fx.py::TestFX::test_typename_print, test/test_fx.py::TestFX::test_typename_print_pre_pep585, test/test_fx.py::TestFX::test_typename_print_union, test/test_fx.py::TestFX::test_unpack, test/test_fx.py::TestFX::test_unpack_dict_better_error, test/test_fx.py::TestFX::test_unpack_list_better_error, test/test_fx.py::TestFX::test_update_args_api, test/test_fx.py::TestFX::test_update_args_kwargs_yells_at_you, test/test_fx.py::TestFX::test_update_kwargs_api, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_function, test/test_fx.py::TestFX::test_user_friendly_call_provenance_with_module, test/test_fx.py::TestFX::test_varargs_concrete, test/test_fx.py::TestFX::test_wrap, test/test_fx.py::TestFX::test_wrap_decorated_function, test/test_fx.py::TestFX::test_wrap_fn_directly, test/test_fx.py::TestFX::test_wrap_with_submodule, test/test_fx.py::TestFX::test_wrapped_method, test/test_fx.py::TestFX::test_wrapped_retrace, test/test_fx.py::TestFX::test_wrapped_via_decorator, test/test_fx.py::TestFX::test_wrapped_via_decorator_and_transformed, test/test_fx.py::TestFX::test_wrong_target_type, test/test_fx.py::TestFX::test_wrong_topo, test/test_fx.py::TestFXAPIBackwardCompatibility::test_adding_side_effect_function, test/test_fx.py::TestFXAPIBackwardCompatibility::test_class_member_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_function_back_compat, test/test_fx.py::TestFXAPIBackwardCompatibility::test_preserve_unused_attr_after_unpickle, test/test_fx.py::TestFXAPIBackwardCompatibility::test_public_api_surface, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_adaptive_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_affine_grid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_avg_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_batch_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_binary_cross_entropy_with_logits, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_celu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_tbc, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_conv_transpose3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cosine_similarity, test/test_fx.py::TestFunctionalTracing::test_nn_functional_cross_entropy, test/test_fx.py::TestFunctionalTracing::test_nn_functional_ctc_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_dropout3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_elu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding, test/test_fx.py::TestFunctionalTracing::test_nn_functional_embedding_bag, test/test_fx.py::TestFunctionalTracing::test_nn_functional_feature_alpha_dropout, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_fractional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gaussian_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_glu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grid_sample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_group_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_gumbel_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardswish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hardtanh_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_hinge_embedding_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_huber_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_instance_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_interpolate, test/test_fx.py::TestFunctionalTracing::test_nn_functional_kl_div, test/test_fx.py::TestFunctionalTracing::test_nn_functional_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_layer_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_leaky_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_linear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_local_response_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_log_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_logsigmoid, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_lp_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_margin_ranking_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool1d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool2d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_pool3d_with_indices, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool1d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool2d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_max_unpool3d, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mish, test/test_fx.py::TestFunctionalTracing::test_nn_functional_mse_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_head_attention_forward, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multi_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_multilabel_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_native_channel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_normalize, test/test_fx.py::TestFunctionalTracing::test_nn_functional_one_hot, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pad, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pairwise_distance, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pdist, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_shuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_pixel_unshuffle, test/test_fx.py::TestFunctionalTracing::test_nn_functional_poisson_nll_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_prelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu6, test/test_fx.py::TestFunctionalTracing::test_nn_functional_relu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rms_norm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_rrelu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_dot_product_attention, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_grouped_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_scaled_mm, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_selu_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_silu, test/test_fx.py::TestFunctionalTracing::test_nn_functional_smooth_l1_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_soft_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmax, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softmin, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softplus, test/test_fx.py::TestFunctionalTracing::test_nn_functional_softshrink, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_threshold_, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_triplet_margin_with_distance_loss, test/test_fx.py::TestFunctionalTracing::test_nn_functional_unfold, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_bilinear, test/test_fx.py::TestFunctionalTracing::test_nn_functional_upsample_nearest, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_H_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_T_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___getitem___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___radd___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rdiv___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmatmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmod___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rmul___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rpow___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive___rsub___cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__batch_norm_with_update_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__chunk_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__native_batch_norm_legit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_lengths_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__segment_reduce_offsets_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__softmax_backward_data_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__unsafe_masked_index_put_accumulate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive__upsample_bilinear2d_aa_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_abs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_acosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcdiv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addcmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmm_decomposed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addmv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_addr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_alias_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_all_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_allclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_aminmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_angle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_any_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_arange_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argsort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_argwhere_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_partial_views_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_as_strided_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_asinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_atleast_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_baddbmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bernoulli_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bfloat16_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_block_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bool_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_shapes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_broadcast_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_bucketize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_byte_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cartesian_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cauchy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cdouble_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ceil_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cfloat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chalf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_char_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_inverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cholesky_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_max_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clamp_min_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_clone_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_column_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_combinations_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_conj_physical_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_constant_pad_nd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_contiguous_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_copysign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_corrcoef_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cos_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cosh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_count_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cov_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cummin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_cumulative_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_deg2rad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diag_embed_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagflat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diagonal_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_diff_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_digamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_floor_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_no_rounding_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_div_trunc_rounding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_double_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_dstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_einsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_permuted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_equal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_erfinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expand_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_expm1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_eye_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_fftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_hfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ifftshift_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_ihfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_irfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fft_rfftn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flip_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fliplr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_flipud_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_float_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_floor_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_fmod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_frexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_full_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gather_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ge_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geometric_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_geqrf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gradient_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_grid_sampler_3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_gt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_half_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hash_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_heaviside_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_histc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_hypot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_igammac_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_index_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_inner_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_int_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isclose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isfinite_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isnan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isneginf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isposinf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_isreal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_item_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_2inputs_2outputs_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_binary_return_by_ref_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_jiterator_unary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kron_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_kthvalue_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ldexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_le_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lerp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lgamma_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cholesky_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cond_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_cross_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_det_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_diagonal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eig_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_eigvalsh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_householder_product_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_inv_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_ldl_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lstsq_grad_oriented_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_factor_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_power_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_matrix_rank_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_multi_dot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_hermitian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_pinv_singular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_slogdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_ex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_solve_triangular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_svdvals_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorinv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_tensorsolve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vander_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vecdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linalg_vector_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_linspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log10_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log1p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_log_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logcumsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logdet_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_and_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_not_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_or_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logical_xor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logspace_tensor_overload_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_long_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_lu_unpack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mH_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mT_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_argmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumprod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_cumsum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_fill_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_log_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logaddexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_logsumexp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_masked_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matmul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_matrix_exp_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_pool2d_with_indices_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_max_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_maximum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_median_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_list_of_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_meshgrid_variadic_tensors_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_binary_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_no_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_min_reduction_with_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_minimum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mode_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_movedim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_msort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mul_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_multinomial_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mv_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nan_to_num_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanmedian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nanquantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nansum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_narrow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_dropout_backward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_native_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ne_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_empty_strided_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_full_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_new_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nextafter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_alpha_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_avg_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_celu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_channel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_conv_transpose3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cosine_similarity_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_cross_entropy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_ctc_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_dropout_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_elu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_bag_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_embedding_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_gelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_glu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_grid_sample_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_group_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardswish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hardtanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_huber_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_instance_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_area_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bicubic_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest-exact_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_interpolate_trilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_kl_div_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_layer_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_leaky_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_linear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_local_response_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_logsigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_margin_ranking_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_pool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mish_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_mse_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_head_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multi_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_normalize_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_circular_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_constant_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_reflect_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pad_replicate_negative_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pairwise_distance_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pdist_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_shuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_pixel_unshuffle_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_poisson_nll_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_prelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu6_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_relu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rms_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_rrelu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_scaled_dot_product_attention_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_selu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_silu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_smooth_l1_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_soft_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softmin_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softplus_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_softsign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_tanhshrink_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_threshold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_bilinear_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nn_functional_upsample_nearest_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_nonzero_static_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_fro_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_inf_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_norm_nuc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_in_place_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_normal_number_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ones_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ormqr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_outer_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pca_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_permute_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pinverse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polar_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_2_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_polygamma_polygamma_n_4_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_positive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_pow_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_put_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_qr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_quantile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rad2deg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rand_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randint_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_randn_like_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_ravel_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_real_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reciprocal_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_remainder_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_renorm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_repeat_interleave_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_reshape_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resize_as__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_conj_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_resolve_neg_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_roll_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rot90_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_round_decimals_neg_3_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_rsub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scalar_tensor_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_add_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_amin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_prod_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_scatter_reduce_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_searchsorted_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_select_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sgn_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_short_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sigmoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sign_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_bartlett_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_blackman_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_exponential_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_gaussian_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_cosine_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_general_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hamming_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_hann_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_kaiser_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signal_windows_nuttall_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_signbit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sin_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sinh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_slice_scatter_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_softmax_with_dtype_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sort_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_mm_reduce_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sparse_sampled_addmm_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_airy_ai_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_j1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_bessel_y1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_entr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_erfcx_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_h_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_hermite_polynomial_he_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i0e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_i1e_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_laguerre_polynomial_l_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_legendre_polynomial_p_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_log_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_i1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtr_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_ndtri_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_scaled_modified_bessel_k1_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_spherical_bessel_j0_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_xlog1py_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_special_zeta_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_list_args_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_split_with_sizes_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sqrt_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_square_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_squeeze_multiple_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_std_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_stft_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sub_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_sum_to_size_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_svd_lowrank_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_t_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_along_dim_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_take_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tan_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tanh_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensor_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tensordot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tile_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_to_sparse_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_topk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__efficient_attention_forward_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_torch_ops_aten__safe_softmax_default_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trace_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_transpose_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapezoid_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trapz_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triangular_solve_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_tril_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_triu_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_true_divide_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_trunc_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unbind_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unflatten_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unfold_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_uniform_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_consecutive_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unique_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_chunk_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsafe_split_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_unsqueeze_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_mean_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_var_unbiased_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vdot_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_complex_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_as_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_copy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_view_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vsplit_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_vstack_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_where_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_xlogy_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zero__cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_cuda_float32, test/test_fx.py::TestOperatorSignaturesCUDA::test_get_torch_func_signature_exhaustive_zeros_like_cuda_float32, test/test_fx.py::TestVisionTracing::test_torchvision_models_alexnet, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_base, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_convnext_tiny, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet121, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet161, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet169, test/test_fx.py::TestVisionTracing::test_torchvision_models_densenet201, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_320_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_mobilenet_v3_large_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fasterrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_fcos_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_keypointrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_maskrcnn_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_retinanet_resnet50_fpn_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssd300_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_detection_ssdlite320_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b0, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b1, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b2, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b3, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b4, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b5, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b6, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_b7, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_l, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_m, test/test_fx.py::TestVisionTracing::test_torchvision_models_efficientnet_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_googlenet, test/test_fx.py::TestVisionTracing::test_torchvision_models_inception_v3, test/test_fx.py::TestVisionTracing::test_torchvision_models_maxvit_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet0_75, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_mnasnet1_3, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v2, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_mobilenet_v3_small, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_x_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_128gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_16gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_1_6gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_32gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_3_2gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_400mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_800mf, test/test_fx.py::TestVisionTracing::test_torchvision_models_regnet_y_8gf, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet152, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet18, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet34, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_32x8d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext101_64x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_resnext50_32x4d, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_deeplabv3_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet101, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_fcn_resnet50, test/test_fx.py::TestVisionTracing::test_torchvision_models_segmentation_lraspp_mobilenet_v3_large, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x0_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x1_5, test/test_fx.py::TestVisionTracing::test_torchvision_models_shufflenet_v2_x2_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_0, test/test_fx.py::TestVisionTracing::test_torchvision_models_squeezenet1_1, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_swin_v2_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg11_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg13_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg16_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19, test/test_fx.py::TestVisionTracing::test_torchvision_models_vgg19_bn, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mc3_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v1_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_mvit_v2_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r2plus1d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_r3d_18, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_s3d, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_b, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_s, test/test_fx.py::TestVisionTracing::test_torchvision_models_video_swin3d_t, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_b_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_h_14, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_16, test/test_fx.py::TestVisionTracing::test_torchvision_models_vit_l_32, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet101_2, test/test_fx.py::TestVisionTracing::test_torchvision_models_wide_resnet50_2
2025-12-04T16:08:24.5459822Z 
2025-12-04T16:08:24.5460114Z Finished test_fx 1/1 ... [2025-12-04 16:08:24.432331][24862.042235719], took 4.24min
2025-12-04T16:08:24.5461105Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.xml
2025-12-04T16:08:24.5796081Z Running test_autocast 1/1 ... [2025-12-04 16:08:24.579363][24862.189272814]
2025-12-04T16:08:24.5796626Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:08:24.5799848Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_autocast.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:24.579764]
2025-12-04T16:08:33.1058280Z 
2025-12-04T16:08:33.1059180Z test_autocast 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_autocast_1.1_7cd62703ceb14b05_.log
2025-12-04T16:08:33.1066904Z Running 20 items in this shard: test/test_autocast.py::TestAutocastCPU::test_autocast_disabled_with_fp32_dtype, test/test_autocast.py::TestAutocastCPU::test_autocast_methods_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_16, test/test_autocast.py::TestAutocastCPU::test_autocast_nn_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_rnn, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_16, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_expect_builtin_promote, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_fp32, test/test_autocast.py::TestAutocastCPU::test_autocast_torch_need_autocast_promote, test/test_autocast.py::TestAutocastCPU::test_cpu_autocast_deprecated_warning, test/test_autocast.py::TestAutocastCPU::test_generic_autocast, test/test_autocast.py::TestAutocastGPU::test_autocast_prioritize, test/test_autocast.py::TestAutocastGPU::test_cache_disabled, test/test_autocast.py::TestAutocastGPU::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_cast_cache_is_global, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_bfloat16_supported, test/test_autocast.py::TestAutocastMPS::test_mps_autocast_error_message, test/test_autocast.py::TestTorchAutocast::test_autocast_fast_dtype, test/test_autocast.py::TestTorchAutocast::test_invalid_device, test/test_autocast.py::TestTorchAutocast::test_non_string_device
2025-12-04T16:08:33.1074177Z 
2025-12-04T16:08:33.1074471Z Finished test_autocast 1/1 ... [2025-12-04 16:08:33.105650][24870.715558004], took 0.14min
2025-12-04T16:08:33.1459953Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.xml
2025-12-04T16:08:33.3108290Z Running test_logging 1/1 ... [2025-12-04 16:08:33.310565][24870.920474056]
2025-12-04T16:08:33.3108818Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:08:33.3112567Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_logging.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:33.310977]
2025-12-04T16:08:40.5854058Z 
2025-12-04T16:08:40.5855019Z test_logging 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_logging_1.1_4a28eee8affd86e2_.log
2025-12-04T16:08:40.5856032Z Running 1 items in this shard: test/test_logging.py::LoggingTest::testApiUsage
2025-12-04T16:08:40.5856461Z 
2025-12-04T16:08:40.5856758Z Finished test_logging 1/1 ... [2025-12-04 16:08:40.585233][24878.195140856], took 0.12min
2025-12-04T16:08:40.6256420Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.xml
2025-12-04T16:08:40.6919390Z Running test_python_dispatch 1/1 ... [2025-12-04 16:08:40.691657][24878.301565813]
2025-12-04T16:08:40.6919980Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:08:40.6922727Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_python_dispatch.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:40.692033]
2025-12-04T16:08:52.9734709Z 
2025-12-04T16:08:52.9735866Z test_python_dispatch 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_python_dispatch_1.1_4a43d809046600b7_.log
2025-12-04T16:08:52.9790687Z Running 119 items in this shard: test/test_python_dispatch.py::TestDispatcherPythonBindings::test_call_boxed, test/test_python_dispatch.py::TestPythonRegistration::test_alias_analysis, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_no_existing, test/test_python_dispatch.py::TestPythonRegistration::test_create_new_library_fragment_with_existing, test/test_python_dispatch.py::TestPythonRegistration::test_dispatcher_error_filenames, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_eq, test/test_python_dispatch.py::TestPythonRegistration::test_dispatchkeyset_pickle, test/test_python_dispatch.py::TestPythonRegistration::test_error_for_unsupported_ns_or_kind, test/test_python_dispatch.py::TestPythonRegistration::test_error_if_fn_not_callable, test/test_python_dispatch.py::TestPythonRegistration::test_extend_library_with_dispatch_key_arg, test/test_python_dispatch.py::TestPythonRegistration::test_fallback, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_fallback_keyset, test/test_python_dispatch.py::TestPythonRegistration::test_fallthrough_for_dense_key_with_meta_in_tls, test/test_python_dispatch.py::TestPythonRegistration::test_finalizer, test/test_python_dispatch.py::TestPythonRegistration::test_override_aten_ops_with_multiple_libraries, test/test_python_dispatch.py::TestPythonRegistration::test_override_cpu_sum, test/test_python_dispatch.py::TestPythonRegistration::test_override_cuda_with_jiterator, test/test_python_dispatch.py::TestPythonRegistration::test_register_fallthrough, test/test_python_dispatch.py::TestPythonRegistration::test_returning_symint, test/test_python_dispatch.py::TestPythonDispatch::test_all_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_autograd_in_attr, test/test_python_dispatch.py::TestPythonDispatch::test_basic, test/test_python_dispatch.py::TestPythonDispatch::test_capture_logs_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_construct_int_tensor, test/test_python_dispatch.py::TestPythonDispatch::test_custom_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_not_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_dispatch_mode_supports_higher_order_operators, test/test_python_dispatch.py::TestPythonDispatch::test_custom_size_policy_dynamic_shapes, test/test_python_dispatch.py::TestPythonDispatch::test_data_ptr_respects_numel_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_non_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_deepcopy_wrapper_subclass_with_clone_returning_different_type, test/test_python_dispatch.py::TestPythonDispatch::test_detach_appears_once_when_called_once, test/test_python_dispatch.py::TestPythonDispatch::test_device_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dim_slowpath, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_call_list_arg, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_super_dont_autograd, test/test_python_dispatch.py::TestPythonDispatch::test_dispatch_uint64, test/test_python_dispatch.py::TestPythonDispatch::test_error_using_class_method_on_mode, test/test_python_dispatch.py::TestPythonDispatch::test_exception_handling, test/test_python_dispatch.py::TestPythonDispatch::test_fancy_strides, test/test_python_dispatch.py::TestPythonDispatch::test_format, test/test_python_dispatch.py::TestPythonDispatch::test_get_cur_mode, test/test_python_dispatch.py::TestPythonDispatch::test_get_mode_stack, test/test_python_dispatch.py::TestPythonDispatch::test_index_put_where_only_index_is_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_invalid_ret, test/test_python_dispatch.py::TestPythonDispatch::test_is_contiguous_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only, test/test_python_dispatch.py::TestPythonDispatch::test_kwarg_only_and_positional_default, test/test_python_dispatch.py::TestPythonDispatch::test_layout_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_like, test/test_python_dispatch.py::TestPythonDispatch::test_list_ret, test/test_python_dispatch.py::TestPythonDispatch::test_make_fx_with_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_make_subclass_with_modes, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_noalloc, test/test_python_dispatch.py::TestPythonDispatch::test_make_wrapper_subclass_propagates_metadata, test/test_python_dispatch.py::TestPythonDispatch::test_maybe_tuple_bug, test/test_python_dispatch.py::TestPythonDispatch::test_mode_detection, test/test_python_dispatch.py::TestPythonDispatch::test_mode_with_make_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_multiple_ops_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_nested_push_logging_tensor_mode, test/test_python_dispatch.py::TestPythonDispatch::test_nesting_same_mode, test/test_python_dispatch.py::TestPythonDispatch::test_new_ones, test/test_python_dispatch.py::TestPythonDispatch::test_none_wrapping, test/test_python_dispatch.py::TestPythonDispatch::test_notimplemented_mode, test/test_python_dispatch.py::TestPythonDispatch::test_optional_tensor_list, test/test_python_dispatch.py::TestPythonDispatch::test_out, test/test_python_dispatch.py::TestPythonDispatch::test_produce_real_type, test/test_python_dispatch.py::TestPythonDispatch::test_record_stream, test/test_python_dispatch.py::TestPythonDispatch::test_return_and_correct_aliasing_gives_correct_stride, test/test_python_dispatch.py::TestPythonDispatch::test_return_stream, test/test_python_dispatch.py::TestPythonDispatch::test_set_data, test/test_python_dispatch.py::TestPythonDispatch::test_shallow_copy_and_detach, test/test_python_dispatch.py::TestPythonDispatch::test_sizes_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_standard_is_not_subclass, test/test_python_dispatch.py::TestPythonDispatch::test_storage, test/test_python_dispatch.py::TestPythonDispatch::test_storage_can_be_converted_to_python_object, test/test_python_dispatch.py::TestPythonDispatch::test_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_creation, test/test_python_dispatch.py::TestPythonDispatch::test_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_sym_sizes_strides_slow_path, test/test_python_dispatch.py::TestPythonDispatch::test_tolist_numpy_with_torch_dispatch_mode, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_basic, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_respects_no_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_subclass_priority, test/test_python_dispatch.py::TestPythonDispatch::test_torch_dispatch_mode_unrelated_tensors, test/test_python_dispatch.py::TestPythonDispatch::test_version, test/test_python_dispatch.py::TestPythonDispatch::test_view_returns_alias_under_torch_dispatch, test/test_python_dispatch.py::TestPythonDispatch::test_with_mode_created_separately, test/test_python_dispatch.py::TestPythonDispatch::test_with_nested_modes, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_extra_dispatch_keys, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_multiprocessing_preserves_dtype, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_reentrant_dispatch_with_mode, test/test_python_dispatch.py::TestPythonDispatch::test_wrapper_subclass_serializes, test/test_python_dispatch.py::TestPythonDispatcher::test_basic, test/test_python_dispatch.py::TestPythonDispatcher::test_lstsq, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_cat_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_conv2d_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCatCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyCubeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyMulScalarCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNMSCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyNonzeroCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySortCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpySplitCopyWithIntCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyTakeCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_custom_NumpyViewCopyCustomOp_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_fft_fft2_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_mul_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_native_batch_norm_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_out_op_cuda, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_split_list_args_cuda_float32, test/test_python_dispatch.py::TestWrapperSubclassAliasingCUDA::test_wrapper_subclass_aliasing_view_cuda_float32
2025-12-04T16:08:52.9843780Z 
2025-12-04T16:08:52.9844179Z Finished test_python_dispatch 1/1 ... [2025-12-04 16:08:52.973466][24890.583372604], took 0.20min
2025-12-04T16:08:53.0142764Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.xml
2025-12-04T16:08:53.0870231Z Running nn/test_lazy_modules 1/1 ... [2025-12-04 16:08:53.086762][24890.696669953]
2025-12-04T16:08:53.0870782Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:08:53.0874034Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_lazy_modules.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:08:53.087181]
2025-12-04T16:09:00.6124373Z 
2025-12-04T16:09:00.6125418Z nn/test_lazy_modules 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_lazy_modules_1.1_641ede76abd1387b_.log
2025-12-04T16:09:00.6147852Z Running 59 items in this shard: test/nn/test_lazy_modules.py::TestLazyModules::test_chained_initialization, test/nn/test_lazy_modules.py::TestLazyModules::test_invalid_functions, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_batchnorm_with_dict_input, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_kwargs, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transpose3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_conv_transposed1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_forward_hook, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm1d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm2d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_instancenorm3d_state, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_linear_pickle, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_linear_state_and_forward, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_jit_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_jit_param, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_module_parameter, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_pre_forward_hook, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_share_memory_buffer, test/nn/test_lazy_modules.py::TestLazyModules::test_lazy_share_memory_param, test/nn/test_lazy_modules.py::TestLazyModules::test_linear, test/nn/test_lazy_modules.py::TestLazyModules::test_linear_state, test/nn/test_lazy_modules.py::TestLazyModules::test_materialize_device, test/nn/test_lazy_modules.py::TestLazyModules::test_materialize_dtype, test/nn/test_lazy_modules.py::TestLazyModules::test_optimizer_pass, test/nn/test_lazy_modules.py::TestLazyModules::test_spectral_norm, test/nn/test_lazy_modules.py::TestLazyModules::test_weight_norm
2025-12-04T16:09:00.6169373Z 
2025-12-04T16:09:00.6169701Z Finished nn/test_lazy_modules 1/1 ... [2025-12-04 16:09:00.612299][24898.222207809], took 0.13min
2025-12-04T16:09:00.6536884Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.xml
2025-12-04T16:09:00.7620411Z Running nn/test_pruning 1/1 ... [2025-12-04 16:09:00.761781][24898.371689445]
2025-12-04T16:09:00.7620934Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:00.7624365Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_pruning.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:00.762178]
2025-12-04T16:09:06.1844050Z 
2025-12-04T16:09:06.1844989Z nn/test_pruning 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_pruning_1.1_fc4532e556fbe9d9_.log
2025-12-04T16:09:06.1857582Z Running 34 items in this shard: test/nn/test_pruning.py::TestPruningNN::test_compute_nparams_to_prune, test/nn/test_pruning.py::TestPruningNN::test_custom_from_mask_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning, test/nn/test_pruning.py::TestPruningNN::test_global_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_identity_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning, test/nn/test_pruning.py::TestPruningNN::test_l1_unstructured_pruning_with_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning, test/nn/test_pruning.py::TestPruningNN::test_ln_structured_pruning_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_multiple_pruning_calls, test/nn/test_pruning.py::TestPruningNN::test_prune, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores, test/nn/test_pruning.py::TestPruningNN::test_prune_importance_scores_mimic_default, test/nn/test_pruning.py::TestPruningNN::test_pruning_container, test/nn/test_pruning.py::TestPruningNN::test_pruning_container_compute_mask, test/nn/test_pruning.py::TestPruningNN::test_pruning_id_consistency, test/nn/test_pruning.py::TestPruningNN::test_pruning_rollback, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_model, test/nn/test_pruning.py::TestPruningNN::test_pruning_serialization_state_dict, test/nn/test_pruning.py::TestPruningNN::test_random_pruning, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_0perc, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_new_weight, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_orig, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_pickle, test/nn/test_pruning.py::TestPruningNN::test_random_pruning_sizes, test/nn/test_pruning.py::TestPruningNN::test_random_structured_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_exception, test/nn/test_pruning.py::TestPruningNN::test_remove_pruning_forward, test/nn/test_pruning.py::TestPruningNN::test_rnn_pruning, test/nn/test_pruning.py::TestPruningNN::test_unstructured_pruning_same_magnitude, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount, test/nn/test_pruning.py::TestPruningNN::test_validate_pruning_amount_init
2025-12-04T16:09:06.1869596Z 
2025-12-04T16:09:06.1869908Z Finished nn/test_pruning 1/1 ... [2025-12-04 16:09:06.184249][24903.794157783], took 0.09min
2025-12-04T16:09:06.2253781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.xml
2025-12-04T16:09:06.2719161Z Running test_monitor 1/1 ... [2025-12-04 16:09:06.271644][24903.88155216]
2025-12-04T16:09:06.2719685Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:06.2723002Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_monitor.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:06.272033]
2025-12-04T16:09:11.7942491Z 
2025-12-04T16:09:11.7943409Z test_monitor 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_monitor_1.1_60acff8e80cf96a3_.log
2025-12-04T16:09:11.7945768Z Running 6 items in this shard: test/test_monitor.py::TestMonitor::test_event_handler, test/test_monitor.py::TestMonitor::test_fixed_count_stat, test/test_monitor.py::TestMonitor::test_interval_stat, test/test_monitor.py::TestMonitor::test_log_event, test/test_monitor.py::TestMonitor::test_wait_counter, test/test_monitor.py::TestMonitorTensorboard::test_event_handler
2025-12-04T16:09:11.7947551Z 
2025-12-04T16:09:11.7947835Z Finished test_monitor 1/1 ... [2025-12-04 16:09:11.794058][24909.403967592], took 0.09min
2025-12-04T16:09:11.8352709Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.xml
2025-12-04T16:09:11.8791658Z Running test_cuda_sanitizer 1/1 ... [2025-12-04 16:09:11.878949][24909.488857898]
2025-12-04T16:09:11.8792196Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:11.8795907Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cuda_sanitizer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:11.879344]
2025-12-04T16:09:19.1537487Z 
2025-12-04T16:09:19.1538532Z test_cuda_sanitizer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cuda_sanitizer_1.1_06ff5e3bcde71deb_.log
2025-12-04T16:09:19.1549874Z Running 31 items in this shard: test/test_cuda_sanitizer.py::TestArgumentHandler::test_add, test/test_cuda_sanitizer.py::TestArgumentHandler::test_cat, test/test_cuda_sanitizer.py::TestArgumentHandler::test_inplace, test/test_cuda_sanitizer.py::TestArgumentHandler::test_nonzero, test/test_cuda_sanitizer.py::TestArgumentHandler::test_out, test/test_cuda_sanitizer.py::TestArgumentHandler::test_split, test/test_cuda_sanitizer.py::TestArgumentHandler::test_tensor_names, test/test_cuda_sanitizer.py::TestEventHandler::test_all_reads_checked_failing, test/test_cuda_sanitizer.py::TestEventHandler::test_all_reads_checked_passing, test/test_cuda_sanitizer.py::TestEventHandler::test_branch_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_chain_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_correct_state_merging, test/test_cuda_sanitizer.py::TestEventHandler::test_deleted_record, test/test_cuda_sanitizer.py::TestEventHandler::test_device_synchronization_expired, test/test_cuda_sanitizer.py::TestEventHandler::test_device_synchronize, test/test_cuda_sanitizer.py::TestEventHandler::test_empty_kernel_launch, test/test_cuda_sanitizer.py::TestEventHandler::test_event_synchronize, test/test_cuda_sanitizer.py::TestEventHandler::test_expired_record, test/test_cuda_sanitizer.py::TestEventHandler::test_multiple_errors, test/test_cuda_sanitizer.py::TestEventHandler::test_multiple_wait, test/test_cuda_sanitizer.py::TestEventHandler::test_new_stream_is_synchronized, test/test_cuda_sanitizer.py::TestEventHandler::test_reads_check_last_write, test/test_cuda_sanitizer.py::TestEventHandler::test_record_override, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_error, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_passing, test/test_cuda_sanitizer.py::TestEventHandler::test_simple_sync, test/test_cuda_sanitizer.py::TestEventHandler::test_stream_synchronize, test/test_cuda_sanitizer.py::TestMessages::test_ensure_does_not_exist, test/test_cuda_sanitizer.py::TestMessages::test_ensure_exists, test/test_cuda_sanitizer.py::TestMessages::test_error_message, test/test_cuda_sanitizer.py::TestMessages::test_subclass
2025-12-04T16:09:19.1560473Z 
2025-12-04T16:09:19.1560817Z Finished test_cuda_sanitizer 1/1 ... [2025-12-04 16:09:19.153598][24916.763505435], took 0.12min
2025-12-04T16:09:19.1950515Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.xml
2025-12-04T16:09:19.2776051Z Running test_bundled_inputs 1/1 ... [2025-12-04 16:09:19.277309][24916.887217795]
2025-12-04T16:09:19.2776609Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:19.2779464Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_bundled_inputs.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:19.277709]
2025-12-04T16:09:25.4516949Z 
2025-12-04T16:09:25.4517912Z test_bundled_inputs 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_bundled_inputs_1.1_395d728a16287961_.log
2025-12-04T16:09:25.4523600Z Running 12 items in this shard: test/test_bundled_inputs.py::TestBundledInputs::test_bad_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_dict_args, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_fail, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_non_mutator, test/test_bundled_inputs.py::TestBundledInputs::test_double_augment_success, test/test_bundled_inputs.py::TestBundledInputs::test_large_tensor_with_inflation, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_both_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_multiple_methods_with_inputs_neither_defined_failure, test/test_bundled_inputs.py::TestBundledInputs::test_non_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_rejected_tensors, test/test_bundled_inputs.py::TestBundledInputs::test_single_tensors
2025-12-04T16:09:25.4528471Z 
2025-12-04T16:09:25.4528797Z Finished test_bundled_inputs 1/1 ... [2025-12-04 16:09:25.451548][24923.061456647], took 0.10min
2025-12-04T16:09:25.4934070Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.xml
2025-12-04T16:09:25.5752465Z Running torch_np/numpy_tests/core/test_numeric 1/1 ... [2025-12-04 16:09:25.574957][24923.184864927]
2025-12-04T16:09:25.5753136Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:25.5756592Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_numeric.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:25.575415]
2025-12-04T16:09:35.4033612Z 
2025-12-04T16:09:35.4034848Z torch_np/numpy_tests/core/test_numeric 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_numeric_1.1_c2ce2dbd13566161_.log
2025-12-04T16:09:35.4150979Z Running 273 items in this shard: test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_copies, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_negative_resize, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_repeats, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_reshape_from_zero, test/torch_np/numpy_tests/core/test_numeric.py::TestResize::test_zeroresize, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_choose, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_clip, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_compress, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_count_nonzero, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_cumproduct, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_diagonal, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_accuracy, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype2, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype3, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype4, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype5, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype6, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_dtype7, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-1, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-10, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_dunder_round_edgecases_val_2147483647_ndigits_-9, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_mean, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_prod, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_ptp, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_ravel, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_repeat, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_reshape, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round_2, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_round_py_consistency, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_searchsorted, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_size, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_squeeze, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_std, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_sum, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_swapaxes, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_take, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_trace, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_transpose, test/torch_np/numpy_tests/core/test_numeric.py::TestNonarrayArgs::test_var, test/torch_np/numpy_tests/core/test_numeric.py::TestIsscalar::test_isscalar, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_and_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_and_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_or_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_or_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_xor_eq, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_bitwise_xor_is, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolScalar::test_logical, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_all_any, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_logical_and_or_xor, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolArray::test_logical_not_abs, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolCmp::test_double, test/torch_np/numpy_tests/core/test_numeric.py::TestBoolCmp::test_float, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_default, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_divide_err, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_errobj, test/torch_np/numpy_tests/core/test_numeric.py::TestSeterr::test_set, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_D, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_F, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_d, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_e, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_floating_exceptions_typecode_f, test/torch_np/numpy_tests/core/test_numeric.py::TestFloatExceptions::test_warnings, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast_2, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_can_cast_values, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_coercion, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_coercion_2, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_promote_types_endian, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_result_type, test/torch_np/numpy_tests/core/test_numeric.py::TestTypes::test_tesult_type_2, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_2592_dtype0_count_10_error_index_5, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_2592_dtype0_count_10_error_index_9, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_empty_result, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_failed_itemsetting, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_lengths, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_too_few_items, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_types, test/torch_np/numpy_tests/core/test_numeric.py::TestFromiter::test_values, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_?, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_B, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_D, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_F, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_b, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_d, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_e, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_f, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_h, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_i, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_axis_all_dtypes_typecode_l, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_count_nonzero_list, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_countnonzero_axis_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_countnonzero_keepdims, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_onedim, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_onedim_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_trivial, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_trivial_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_twodim, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_zerod, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_nonzero_zerod_differs, test/torch_np/numpy_tests/core/test_numeric.py::TestNonzeroAndCountNonzero::test_sparse, test/torch_np/numpy_tests/core/test_numeric.py::TestIndex::test_boolean, test/torch_np/numpy_tests/core/test_numeric.py::TestIndex::test_boolean_edgecase, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_large_neg_int64, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_neg_width_boundaries, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_negative, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_positive, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_sufficient_width, test/torch_np/numpy_tests/core/test_numeric.py::TestBinaryRepr::test_zero, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_base3, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_base_range, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_negative, test/torch_np/numpy_tests/core/test_numeric.py::TestBaseRepr::test_positive, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equal, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equal_equal_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_array_equiv, test/torch_np/numpy_tests/core/test_numeric.py::TestArrayComparisons::test_none_compares_elementwise, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_array_double, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_func_takes_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_inplace_array, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_inplace_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_non_contig, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_property, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_scalar_nan_propagation_arr0_amin0_amax0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin2_amax2, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin_1_amax1, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_value_min_max_flip_amin_1_amax_0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_array_int32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_array_outint32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_memory_overlap, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple2, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_simple_int32, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_clip_with_out_transposed, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_noncontig_inplace, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_D, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_F, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_2_dtype_e, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_?, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_B, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_b, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_d, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_f, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_h, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_i, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_ones_pathological_dtype_l, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_double, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_inplace_01, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_inplace_02, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_inout_casting0, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_inout_casting_unsafe, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int32_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int64_inout, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_int64_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_nonnative, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_simple_out, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_01, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_02, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_03, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_04, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_05, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_06, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_07, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_08, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_09, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_10, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_11, test/torch_np/numpy_tests/core/test_numeric.py::TestClip::test_type_cast_12, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_equalnan, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_ip_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_ip_not_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_min_int, test/torch_np/numpy_tests/core/test_numeric.py::TestAllclose::test_no_parameter_modification, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_equal_nan, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_all_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_isclose_allclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_ip_none_isclose, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_no_parameter_modification, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_non_finite_scalar, test/torch_np/numpy_tests/core/test_numeric.py::TestIsclose::test_scalar_return, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_basic, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_ddof1, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_ddof2, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_out_scalar, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVar::test_scalars, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVarComplex::test_basic, test/torch_np/numpy_tests/core/test_numeric.py::TestStdVarComplex::test_scalars, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_for_reference_leak, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_full, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_ones, test/torch_np/numpy_tests/core/test_numeric.py::TestCreationFuncs::test_zeros, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc0_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc0_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc1_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc1_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc2_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc2_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc3_dtype0, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_dtype_str_bytes_likefunc3_dtype1, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_empty_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_filled_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_ones_like, test/torch_np/numpy_tests/core/test_numeric.py::TestLikeFuncs::test_zeros_like, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_complex, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_float, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_mode, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_no_overwrite, test/torch_np/numpy_tests/core/test_numeric.py::TestCorrelate::test_zero_size, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_mode, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_no_overwrite, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_numpy_doc_examples, test/torch_np/numpy_tests/core/test_numeric.py::TestConvolve::test_object, test/torch_np/numpy_tests/core/test_numeric.py::TestDtypePositional::test_dtype_positional, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_2D, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_list, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_0, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_1, test/torch_np/numpy_tests/core/test_numeric.py::TestArgwhere::test_nd_nd_2, test/torch_np/numpy_tests/core/test_numeric.py::TestStringFunction::test_set_string_function, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll1d, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll2d, test/torch_np/numpy_tests/core/test_numeric.py::TestRoll::test_roll_empty, test/torch_np/numpy_tests/core/test_numeric.py::TestRollaxis::test_exceptions, test/torch_np/numpy_tests/core/test_numeric.py::TestRollaxis::test_results, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_errors, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_multiples, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_new_position, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_move_to_end, test/torch_np/numpy_tests/core/test_numeric.py::TestMoveaxis::test_preserve_order, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_2x2, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_2x3, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_3x3, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_broadcasting, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_broadcasting_shapes, test/torch_np/numpy_tests/core/test_numeric.py::TestCross::test_uint8_int32_mixed_dtypes, test/torch_np/numpy_tests/core/test_numeric.py::TestOuterMisc::test_outer_out_param, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype0_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype1_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype2_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims0, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims1, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_return_type_dtype3_dims2, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_scalar_input, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_simple, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_single_input, test/torch_np/numpy_tests/core/test_numeric.py::TestIndices::test_sparse, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_C_and_F_simul, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_non_array_input, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_require_each, test/torch_np/numpy_tests/core/test_numeric.py::TestRequire::test_unknown_requirement, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_error_kwargs, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_in_args, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_broadcast_single_arg, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_number_of_arguments, test/torch_np/numpy_tests/core/test_numeric.py::TestBroadcast::test_shape_mismatch_error_message, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimension, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimension_einsum, test/torch_np/numpy_tests/core/test_numeric.py::TestTensordot::test_zero_dimensional
2025-12-04T16:09:35.4265500Z 
2025-12-04T16:09:35.4265930Z Finished torch_np/numpy_tests/core/test_numeric 1/1 ... [2025-12-04 16:09:35.403600][24933.013507384], took 0.16min
2025-12-04T16:09:35.4452979Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.xml
2025-12-04T16:09:35.5310561Z Running torch_np/numpy_tests/core/test_multiarray 1/1 ... [2025-12-04 16:09:35.530816][24933.140725026]
2025-12-04T16:09:35.5311211Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:09:35.5314885Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/core/test_multiarray.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:09:35.531226]
2025-12-04T16:10:09.9430715Z 
2025-12-04T16:10:09.9432054Z torch_np/numpy_tests/core/test_multiarray 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.1_f5a85c7d65f3960a_.log
2025-12-04T16:10:09.9860071Z Running 864 items in this shard: test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_otherflags, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag__warn_on_write_flag_value_True_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_False_writeable_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_readonly_flag_protocols_flag_writeable_flag_value_True_writeable_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_string_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_void_align, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_warnonwrite, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_any_base, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_buffer, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_from_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestFlag::test_writeable_pickle, test/torch_np/numpy_tests/core/test_multiarray.py::TestHash::test_int, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_attributes_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_dtypeattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_max_uint64, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_readonly, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_fill_struct_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_set_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestAttributes::test_stridesattr, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_0d_array_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_as_keyword_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_cont, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_false_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_copy_true_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_array_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asanyarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_ascontiguousarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayConstruction::test_bad_arguments_error_asfortranarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_broadcasting, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_assignment_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_cast_to_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_longdouble_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_stringlike_empty_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestAssignment::test_unicode_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestDtypedescr::test_construction, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_ellipsis_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_empty_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_invalid_subscript_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_newaxis, test/torch_np/numpy_tests/core/test_multiarray.py::TestScalarIndexing::test_overlapping_assignment, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_of_ragged_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_array_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_deep_nonragged_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_empty_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_failed_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_iterable, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_false_len_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_attribute, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_from_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_malloc_fails, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_no_len_object_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_non_sequence_sequence, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype0_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_(2,3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,(3)O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function0, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function1, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_object_initialized_to_None_dtype_O,O_function2, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_ndim_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_ragged_shape_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_sequence_non_homogeneous, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_arr, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_structured_void_promotion_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_too_big_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_like_like_zeros, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestCreation::test_zeros_obj_obj, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_bytes, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_unicode, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_cast_from_void, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_count_nonzero_unaligned, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_sum_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestBool::test_test_interning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__complex__should_not_work, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test__deepcopy___dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_all_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_any_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_gh5524_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argpartition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_argsort_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_2_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_arr_mult_func1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_choose_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_compress, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_conjugate_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_copy, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_diagonal_view_notwriteable, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_dot_out_mem_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_matmul_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_2_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func0_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_no_dgemv_func1_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_empty_array_kth_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_fuzz, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_integer, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_iterative, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_?, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_B, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_D, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_b, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_d, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_e, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_f, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_h, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_i, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_partition_out_of_range_dtype_l, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_prod, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_put, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_ravel, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_repeat, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_reshape, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_round, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_default_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f16, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_floats_f32, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_n_elements, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_resetting, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_type_specific_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_unaligned_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_invalid_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_searchsorted_with_sorter, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_size_zero_memleak, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype0_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_imag, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_dtype1_part_real, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_complex_nans, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_degraded, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_signed_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_size_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_sort_unsigned_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_squeeze, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_swapaxes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_trace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMethods::test_transpose, test/torch_np/numpy_tests/core/test_multiarray.py::TestCequenceMethods::test_array_contains, test/torch_np/numpy_tests/core/test_multiarray.py::TestBinop::test_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestSubscripting::test_test_zero_rank, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_assign_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_list, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_mask2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFancyIndexing::test_tuple, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_max, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_all_method_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size0_axis0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size10_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size11_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size12_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size13_axis13_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size14_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size15_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size16_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size17_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size18_axis18_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size19_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size1_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size20_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size21_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size22_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size23_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size24_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size25_axis25_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size26_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size27_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size28_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size29_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size2_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size30_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size31_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size32_axis32_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size33_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size34_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size35_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size36_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size37_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size38_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size39_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size3_axis3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size40_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size41_axis41_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size42_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size43_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size44_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size45_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size46_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size47_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size48_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size49_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size4_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size50_axis50_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size51_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size52_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size53_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size54_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size55_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size56_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size57_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size58_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size59_axis59_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size5_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size60_axis_-4_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size61_axis_-3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size62_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size63_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size64_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size65_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size66_axis_2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size67_axis_3_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size68_axis68_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size69_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size6_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size70_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size71_axis71_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size72_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size73_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size74_axis74_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size75_axis_-1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size76_axis_0_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size77_axis77_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size7_axis_1_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size8_axis8_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_argmin_argmax_keepdims_size9_axis_-2_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmax_np_method0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_np_vs_ndarray_arr_method_argmin_np_method1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_output_shape_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_0_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmax, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmaxArgminCommon::test_ret_is_out_ndim_1_method_argmin, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmax::test_maximum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data10, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data11, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data12, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data13, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data14, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data15, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data16, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data17, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data18, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data19, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data20, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data21, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data22, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data23, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data24, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data25, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data26, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data27, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data28, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data29, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data3, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data30, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data31, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data32, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data33, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data34, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data35, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data36, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data37, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data38, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data39, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data4, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data40, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data41, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data42, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data43, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data44, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data45, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data46, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data47, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data48, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data49, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data5, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data50, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data51, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data52, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data53, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data54, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data55, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data56, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data57, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data58, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data59, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data6, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data60, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data61, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data7, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data8, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_combinations_data9, test/torch_np/numpy_tests/core/test_multiarray.py::TestArgmin::test_minimum_signed_integers, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinMax::test_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestNewaxis::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_max_or_min, test/torch_np/numpy_tests/core/test_multiarray.py::TestClip::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_flatten, test/torch_np/numpy_tests/core/test_multiarray.py::TestCompress::test_truncate, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_False, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_byteorder_greater_True, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_mask_size, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_overlaps, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_record_array, test/torch_np/numpy_tests/core/test_multiarray.py::TestPutmask::test_writeable, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_clip, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ip_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_out_overlap, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape0, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape1, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_ret_is_out_shape2, test/torch_np/numpy_tests/core/test_multiarray.py::TestTake::test_wrap, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype3, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype4, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype5, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype6, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_basic_dtype7, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_datetime, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_invalid_axis, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_mixed, test/torch_np/numpy_tests/core/test_multiarray.py::TestLexsort::test_object, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_ascii, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_big_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_bool_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_counted_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_dtype_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_empty_files_text, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_file_position_after_tofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_bad_dup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_offset, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromfile_subarray_binary, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_fromstring_count0, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_inf, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_int64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_buffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_io_open_unbuffered_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_largish_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_load_object_array_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_long_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_malformed, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nan, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_nofile, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_numbers, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_parsing_subarray_unsupported, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_read_shorter_than_count_subarray, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_binary_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_dump_pathlib, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_file, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_repr, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_roundtrip_str, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_string_with_ws, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_cleanup, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_tofile_sep, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_uint64_fromstring, test/torch_np/numpy_tests/core/test_multiarray.py::TestIO::test_unseekable_fromfile, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_array_base_obj_12345678, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_big_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_basic_little_dtype2, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestFromBuffer::test_mmap_close, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_0d_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_reference, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_check_weakref, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_empty_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_freeform_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_int_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_invalid_arguments, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_none_shape, test/torch_np/numpy_tests/core/test_multiarray.py::TestResize::test_zeros_appended, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_ddof_too_big, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_dtype, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_dtype_from_input, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_empty, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_keepdims, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_float16, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_mean_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_python_type, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_std_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_axis_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_byteorder, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex128_ndec_7, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_complex_values_complex_dtype_complex64_ndec_6, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_dimensions, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestStats::test_var_where, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestVdot::test_vdot_uncontiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_accelerate_framework_sgemv_fix, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_all, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_2args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_3args_errors, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dot_array_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect1, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotcolumnvect2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotmatvec2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecmat3, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecscalar2, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecinner, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_dotvecvecouter, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype0, test/torch_np/numpy_tests/core/test_multiarray.py::TestDot::test_huge_vectordot_dtype1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mmT6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv11, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mv12, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN5, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN6, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN7, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN8, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvN9, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_mvn10, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_s0_4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm1, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm3, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_dot_equivalent_vm4, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_empty_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_bool, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_add, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matmul_exception_multiply, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_arg, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_out_contiguous_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmul::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_array_priority_override, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_exceptions, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_axes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_inplace_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matmul_raises, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_matrix_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_result_types_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_scalar_output, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_shapes, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_matrix_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestMatmulOperator::test_vector_vector_values, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_3d_tensor, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_reversed_view, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_product_with_various_contiguities, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_inner_scalar_and_vector, test/torch_np/numpy_tests/core/test_multiarray.py::TestInner::test_vecself, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_docstring_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops0, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops1, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops2, test/torch_np/numpy_tests/core/test_multiarray.py::TestChoose::test_output_dtype_ops3, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_axis_spec, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast1, test/torch_np/numpy_tests/core/test_multiarray.py::TestRepeat::test_broadcast2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWarnings::test_complex_warning, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_complex, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_float, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_nonscalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestMinScalarType::test_usigned_shortshort, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_byteorder_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_char_vs_string, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_field_order, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_intra_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_native_padding_3, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_padding_with_array_inside_struct, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_trailing_padding, test/torch_np/numpy_tests/core/test_multiarray.py::TestPEP3118Dtype::test_unnamed_fields, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test___array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_array_interfaces, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_buffer_interface, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_compatible_cast, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr0_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order12_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_C_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_A, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_C, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_F, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_order_mismatch_arr1_order1_F_order2_K, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_scalars, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayCreationCopyArgument::test_striding_not_ok, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_not_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_flags_writable_attribute_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_not_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestArrayAttributeDeletion::test_multiarray_writable_attributes_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestDelMisc::test_flat_element_deletion, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_array_scalar_relational_operation, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_bool_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar, test/torch_np/numpy_tests/core/test_multiarray.py::TestConversion::test_to_int_scalar_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_basic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_dtype_mix, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_empty_result, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_error, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_exotic_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_foreign, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_kwargs, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_largedim, test/torch_np/numpy_tests/core/test_multiarray.py::TestWhere::test_ndim, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_arrays_not_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestHashing::test_collections_hashable, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_0d, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestFormat::test_1d_no_format, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmax_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_argmin_with_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_choose_mod_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_dot_out, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_flatiter__array__, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_insert_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_put_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_putmask_noncontiguous, test/torch_np/numpy_tests/core/test_multiarray.py::TestWritebackIfCopy::test_take_mode_raise, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_arange_booleans, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_error_paths_and_promotion_which_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt0, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt1, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_explicit_dtype_dt2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_infinite, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_nan_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_require_range_2, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_start_stop_kwarg, test/torch_np/numpy_tests/core/test_multiarray.py::TestArange::test_zero_step, test/torch_np/numpy_tests/core/test_multiarray.py::TestRichcompareScalar::test_richcompare_scalar_boolean_singleton_return, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_1023, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_128, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_151, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_16, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_191, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_2047, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_24, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_256, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_32, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_383, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_48, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_512, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_64, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_8, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_float_N_96, test/torch_np/numpy_tests/core/test_multiarray.py::TestSortFloatMisc::test_sort_int
2025-12-04T16:10:10.0275692Z 
2025-12-04T16:10:10.0276171Z Finished torch_np/numpy_tests/core/test_multiarray 1/1 ... [2025-12-04 16:10:09.944262][24967.554168142], took 0.57min
2025-12-04T16:10:10.0277723Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.xml
2025-12-04T16:10:10.0813633Z Running test_itt 1/1 ... [2025-12-04 16:10:10.081057][24967.690964227]
2025-12-04T16:10:10.0814146Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:10:10.0817471Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_itt.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:10.081494]
2025-12-04T16:10:15.4536018Z 
2025-12-04T16:10:15.4536904Z test_itt 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_itt_1.1_0c67806275155360_.log
2025-12-04T16:10:15.4538042Z Running 1 items in this shard: test/test_itt.py::TestItt::test_itt
2025-12-04T16:10:15.4538433Z 
2025-12-04T16:10:15.4538693Z Finished test_itt 1/1 ... [2025-12-04 16:10:15.453395][24973.063305367], took 0.09min
2025-12-04T16:10:15.4959467Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.xml
2025-12-04T16:10:15.5192739Z Running torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 16:10:15.518995][24973.128903197]
2025-12-04T16:10:15.5193409Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:10:15.5196840Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_function_base.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:15.519423]
2025-12-04T16:10:23.3949570Z 
2025-12-04T16:10:23.3950975Z torch_np/numpy_tests/lib/test_function_base 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_function_base_1.1_66e1a2bc19dbe7b5_.log
2025-12-04T16:10:23.4203395Z Running 505 items in this shard: test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestRot90::test_rotation_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_3d_swap_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_4d, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_lr, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_basic_ud, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_default_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestFlip::test_multiple_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAny::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAll::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCopy::test_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_average_class_without_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x0_axis0_expected_avg0_weights0_expected_wavg0_expected_wsum0, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_basic_keepdims_x1_axis_0_expected_avg1_weights1_expected_wavg1_expected_wsum1, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_returned, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_upcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestAverage::test_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_broadcasting, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_deprecated_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_many_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_non_bool_deprecation, test/torch_np/numpy_tests/lib/test_function_base.py::TestSelect::test_return_dtype, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_array_copied, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_-4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_index_out_of_bounds_idx_4, test/torch_np/numpy_tests/lib/test_function_base.py::TestInsert::test_multidim, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmax::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestAmin::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPtp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumsum::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestProd::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCumprod::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_append, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_n, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_nd, test/torch_np/numpy_tests/lib/test_function_base.py::TestDiff::test_prepend, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_array_order_preserve, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_fancy, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_index_floats, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_[1], test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_array([1]), test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_single_item_array_non_int, test/torch_np/numpy_tests/lib/test_function_base.py::TestDelete::test_slices, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_args, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_badargs, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_decreasing_unsigned_int_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_f_signed_int_big_jump_f_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_inexact_dtypes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_second_order_accurate, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_spacing, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_specific_axes, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_decreasing_unsigned_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype0, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype1, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype2, test/torch_np/numpy_tests/lib/test_function_base.py::TestGradient::test_x_signed_int_big_jump_x_dtype3, test/torch_np/numpy_tests/lib/test_function_base.py::TestAngle::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_all_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_leading_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_list_to_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_no_trim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_overflow_arr0, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_size_zero, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrimZeros::test_trailing_skip, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_both, test/torch_np/numpy_tests/lib/test_function_base.py::TestExtins::test_place, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_casting_error, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_forward, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_decreasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_large_integers_increasing, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_monotonic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_random, test/torch_np/numpy_tests/lib/test_function_base.py::TestDigitize::test_right_open_reverse, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_bartlett_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_blackman_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hamming_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_hanning_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_B_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_b_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_d_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_e_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_f_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_h_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_i_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_0, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestFilterwindows::test_kaiser_dtype_l_M_10, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_ndim, test/torch_np/numpy_tests/lib/test_function_base.py::TestTrapz::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestSinc::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestUnique::test_simple_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_dtype_order, test/torch_np/numpy_tests/lib/test_function_base.py::TestCheckFinite::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_bias, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_corrcoef_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_extreme, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_non_array, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestCorrCoef::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_rowvar, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_1D_variance, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type0, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type1, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_cov_dtype_test_type2, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_fweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_unit_fweights_and_aweights, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_wrong_ddof, test/torch_np/numpy_tests/lib/test_function_base.py::TestCov::test_xy, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::Test_I0::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_int_beta, test/torch_np/numpy_tests/lib/test_function_base.py::TestKaiser::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMsort::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_invalid_arguments, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_indexing, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_shape, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_nd_values, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_no_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_return_type, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_single_input, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_sparse, test/torch_np/numpy_tests/lib/test_function_base.py::TestMeshgrid::test_writeback, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_0d_condition, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_0d_comparison, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_default, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_multidimensional_extrafunc, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_scalar_domains_three_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestPiecewise::test_two_conditions, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_dtype_reference_leaks, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_empty_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals0, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_error_not_1d_vals_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_simple_weight2, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_incorrect_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_and_weights, test/torch_np/numpy_tests/lib/test_function_base.py::TestBincount::test_with_minlength_smaller_than_maxvalue, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_complex_interp, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_exceptions, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_if_len_x_is_small, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_any_nan_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_behavior_exact_x, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_f_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_x_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_half_inf_xf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-both, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-imag, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_complex-real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_non_finite_inf_real, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_period, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_right_left_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_scalar_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestInterp::test_zero_dimensional_interpolation_point, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_2D, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_api, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_exception, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q1_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_keepdims_out_q_7_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_extrapolation, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype5_expected_dtype5_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype6_expected_dtype6_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype7_expected_dtype7_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_B_expected_dtype0_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_b_expected_dtype1_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_h_expected_dtype2_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_i_expected_dtype3_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_averaged_inverted_cdf_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_closest_observation_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_hazen_expected_27_5, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_interpolated_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_inverted_cdf_expected_20, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_linear_expected_29, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_median_unbiased_expected_27, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_normal_unbiased_expected_27_125, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_interpolation_input_dtype_l_expected_dtype4_method_weibull_expected_26, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_linear_nan_1D_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_lower_higher_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_midpoint_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nan_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_d, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_e, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_f, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_nearest_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_empty_dim, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_list, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_no_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_percentile_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_scalar_q_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestPercentile::test_sequence, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_correct_quantile_value, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_fraction, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_max_ulp, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_no_p_overwrite, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_hypo, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_averaged_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_closest_observation, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_hazen, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_higher, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_interpolated_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_inverted_cdf, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_linear, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_lower, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_median_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_midpoint, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_nearest, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_normal_unbiased, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_monotonic_method_weibull, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_B, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_b, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_h, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_i, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_preserve_int_type_dtype_l, test/torch_np/numpy_tests/lib/test_function_base.py::TestQuantile::test_quantile_scalar_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_array_like, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_axis_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_basic_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_empty, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_extended_axis_invalid, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis0, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis4, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_keepdims_out_axis_1, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_2, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_nan_behavior_3, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_out_nan, test/torch_np/numpy_tests/lib/test_function_base.py::TestMedian::test_overwrite_keyword, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_complex, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_B_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_H_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_b_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_g_type_out_G, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_h_type_out_F, test/torch_np/numpy_tests/lib/test_function_base.py::TestSortComplex::test_sort_real_type_in_l_type_out_D
2025-12-04T16:10:23.4451025Z 
2025-12-04T16:10:23.4451514Z Finished torch_np/numpy_tests/lib/test_function_base 1/1 ... [2025-12-04 16:10:23.395619][24981.005524864], took 0.13min
2025-12-04T16:10:23.4453074Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.xml
2025-12-04T16:10:23.5417534Z Running test_masked 1/1 ... [2025-12-04 16:10:23.541442][24981.151348803]
2025-12-04T16:10:23.5418066Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:10:23.5421573Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_masked.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:23.541888]
2025-12-04T16:10:59.0552801Z 
2025-12-04T16:10:59.0553800Z test_masked 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_masked_1.1_f4f98418cc401a0c_.log
2025-12-04T16:10:59.0636278Z Running 194 items in this shard: test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_coo_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_sparse_csr_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amax_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_amin_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_mean_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_prod_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_bool, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_mask_layout_strided_masked_sum_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_log_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_norm_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_normalize_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmax_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_softmin_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_std_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_bfloat16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex128, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_complex64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_float64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int16, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int32, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int64, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_int8, test/test_masked.py::TestMaskedCUDA::test_reference_masked_masked_var_cuda_uint8, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_coo_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_csr_fill_value_123_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_0_cuda, test/test_masked.py::TestMaskedCUDA::test_where_hybrid_coo_fill_value_123_cuda
2025-12-04T16:10:59.0716093Z 
2025-12-04T16:10:59.0716398Z Finished test_masked 1/1 ... [2025-12-04 16:10:59.055443][25016.665349331], took 0.59min
2025-12-04T16:10:59.0988297Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.xml
2025-12-04T16:10:59.1850030Z Running optim/test_lrscheduler 1/1 ... [2025-12-04 16:10:59.184720][25016.794627735]
2025-12-04T16:10:59.1850612Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:10:59.1853895Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'optim/test_lrscheduler.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:10:59.185153]
2025-12-04T16:11:03.8578400Z 
2025-12-04T16:11:03.8579422Z optim/test_lrscheduler 1/1 was successful, full logs can be found in artifacts with path test/test-reports/optim.test_lrscheduler_1.1_50b469a96bd12a6b_.log
2025-12-04T16:11:03.8580227Z 
2025-12-04T16:11:03.8580594Z Finished optim/test_lrscheduler 1/1 ... [2025-12-04 16:11:03.857627][25021.467534489], took 0.08min
2025-12-04T16:11:03.9006730Z Running test_datapipe 1/1 ... [2025-12-04 16:11:03.900422][25021.510330343]
2025-12-04T16:11:03.9007267Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:11:03.9011318Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_datapipe.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:11:03.900829]
2025-12-04T16:11:25.8454842Z 
2025-12-04T16:11:25.8455801Z test_datapipe 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_datapipe_1.1_628e5e9adba39130_.log
2025-12-04T16:11:25.8491695Z Running 93 items in this shard: test/test_datapipe.py::TestDataChunk::test_as_string, test/test_datapipe.py::TestDataChunk::test_getitem, test/test_datapipe.py::TestDataChunk::test_iter, test/test_datapipe.py::TestDataChunk::test_len, test/test_datapipe.py::TestDataChunk::test_random_shuffle, test/test_datapipe.py::TestDataChunk::test_reverse, test/test_datapipe.py::TestDataChunk::test_sort, test/test_datapipe.py::TestStreamWrapper::test_api, test/test_datapipe.py::TestStreamWrapper::test_dir, test/test_datapipe.py::TestStreamWrapper::test_pickle, test/test_datapipe.py::TestStreamWrapper::test_repr, test/test_datapipe.py::TestIterableDataPipeBasic::test_demux_mux_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_groupby_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfiles_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_listdirfilesdeterministic_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_map_with_col_file_handle_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_openfilesfromdisk_iterable_datapipe, test/test_datapipe.py::TestIterableDataPipeBasic::test_routeddecoder_iterable_datapipe, test/test_datapipe.py::TestCaptureDataFrame::test_basic_capture, test/test_datapipe.py::TestDataFramesPipes::test_batch, test/test_datapipe.py::TestDataFramesPipes::test_capture, test/test_datapipe.py::TestDataFramesPipes::test_collate, test/test_datapipe.py::TestDataFramesPipes::test_filter, test/test_datapipe.py::TestDataFramesPipes::test_shuffle, test/test_datapipe.py::TestDataFramesPipes::test_unbatch, test/test_datapipe.py::TestFunctionalIterDataPipe::test_batch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_collate_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_concat_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_demux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalIterDataPipe::test_filter_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_fork_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_iterable_wrapper_datapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_dict_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_map_tuple_list_with_col_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_mux_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_sampler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalIterDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalIterDataPipe::test_shuffler_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_stream_reader_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_unbatch_iterdatapipe, test/test_datapipe.py::TestFunctionalIterDataPipe::test_zip_iterdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_batch_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_concat_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_docstring, test/test_datapipe.py::TestFunctionalMapDataPipe::test_map_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_sequence_wrapper_datapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable, test/test_datapipe.py::TestFunctionalMapDataPipe::test_serializable_with_dill, test/test_datapipe.py::TestFunctionalMapDataPipe::test_shuffler_mapdatapipe, test/test_datapipe.py::TestFunctionalMapDataPipe::test_zip_mapdatapipe, test/test_datapipe.py::TestTyping::test_compile_time, test/test_datapipe.py::TestTyping::test_construct_time, test/test_datapipe.py::TestTyping::test_isinstance, test/test_datapipe.py::TestTyping::test_issubinstance, test/test_datapipe.py::TestTyping::test_protocol, test/test_datapipe.py::TestTyping::test_reinforce, test/test_datapipe.py::TestTyping::test_runtime, test/test_datapipe.py::TestTyping::test_subtype, test/test_datapipe.py::TestGraph::test_simple_traverse, test/test_datapipe.py::TestGraph::test_traverse_circular_datapipe, test/test_datapipe.py::TestGraph::test_traverse_forked, test/test_datapipe.py::TestGraph::test_traverse_mapdatapipe, test/test_datapipe.py::TestGraph::test_traverse_mixdatapipe, test/test_datapipe.py::TestGraph::test_traverse_unhashable_datapipe, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_iter, test/test_datapipe.py::TestSerialization::test_spawn_lambdas_map, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_dill, test/test_datapipe.py::TestCircularSerialization::test_circular_serialization_with_pickle, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding, test/test_datapipe.py::TestSharding::test_legacy_custom_sharding_with_old_dataloader, test/test_datapipe.py::TestSharding::test_multi_sharding, test/test_datapipe.py::TestSharding::test_old_dataloader, test/test_datapipe.py::TestSharding::test_sharding_groups, test/test_datapipe.py::TestSharding::test_sharding_length, test/test_datapipe.py::TestSharding::test_simple_sharding, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_buggy, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_constraint_multiple_outputs, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_generator, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_new_object, test/test_datapipe.py::TestIterDataPipeSingletonConstraint::test_iterdatapipe_singleton_self_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_generator_function_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_next_exception, test/test_datapipe.py::TestIterDataPipeCountSampleYielded::test_iterdatapipe_sample_yielded_return_self, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_non_generator, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_custom_self_next, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_repeated, test/test_datapipe.py::TestIterDataPipeGraphFastForward::test_simple_snapshot_graph_with_serialization
2025-12-04T16:11:25.8527107Z 
2025-12-04T16:11:25.8527400Z Finished test_datapipe 1/1 ... [2025-12-04 16:11:25.845371][25043.455281121], took 0.37min
2025-12-04T16:11:25.8880824Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.xml
2025-12-04T16:11:25.9662386Z Running nn/test_convolution 1/1 ... [2025-12-04 16:11:25.965977][25043.575885447]
2025-12-04T16:11:25.9662927Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:11:25.9666243Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'nn/test_convolution.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:11:25.966381]
2025-12-04T16:12:11.7521007Z 
2025-12-04T16:12:11.7524083Z nn/test_convolution 1/1 was successful, full logs can be found in artifacts with path test/test-reports/nn.test_convolution_1.1_d98f421ddfbea09e_.log
2025-12-04T16:12:11.7939931Z Running 606 items in this shard: test/nn/test_convolution.py::TestConvolutionNN::test_Conv1d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_1x1, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_OneDNN, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_backward_twice, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_groups_nobias_v2, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_with_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_inconsistent_types_on_GPU_without_cudnn, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_missing_argument, test/nn/test_convolution.py::TestConvolutionNN::test_Conv2d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_nobias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_groups_wbias, test/nn/test_convolution.py::TestConvolutionNN::test_Conv3d_module_same_padding, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_half_cublas_gemm, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose2d_output_size_downsample_upsample, test/nn/test_convolution.py::TestConvolutionNN::test_ConvTranspose3d_correct_output_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv1d_issue_120547, test/nn/test_convolution.py::TestConvolutionNN::test_conv2d_discontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_issue_120406, test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values, test/nn/test_convolution.py::TestConvolutionNN::test_conv_aten_invalid_groups, test/nn/test_convolution.py::TestConvolutionNN::test_conv_backcompat, test/nn/test_convolution.py::TestConvolutionNN::test_conv_cudnn_memory_layout_dominance, test/nn/test_convolution.py::TestConvolutionNN::test_conv_invalid_groups, test/nn/test_convolution.py::TestConvolutionNN::test_conv_modules_raise_error_on_incorrect_input_size, test/nn/test_convolution.py::TestConvolutionNN::test_conv_padding_mode, test/nn/test_convolution.py::TestConvolutionNN::test_conv_shapecheck, test/nn/test_convolution.py::TestConvolutionNN::test_conv_tbc, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_non_contiguous, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_noncontiguous_weight, test/nn/test_convolution.py::TestConvolutionNN::test_cudnn_not_mutate_stride, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv, test/nn/test_convolution.py::TestConvolutionNN::test_functional_grad_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv1d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv2d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_input, test/nn/test_convolution.py::TestConvolutionNN::test_grad_conv3d_weight, test/nn/test_convolution.py::TestConvolutionNN::test_grouped_conv_cudnn_nhwc_support, test/nn/test_convolution.py::TestConvolutionNN::test_huge_padding, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv1d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_invalid_conv3d, test/nn/test_convolution.py::TestConvolutionNN::test_mismatch_shape_conv2d, test/nn/test_convolution.py::TestConvolutionNN::test_nnpack_conv, test/nn/test_convolution.py::TestConvolutionNN::test_permute_conv2d_issue_120211, test/nn/test_convolution.py::TestConvolutionNN::test_thnn_conv_strided_padded_dilated, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_backward_depthwise_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_1_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_2_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_deterministic_cudnn_dilation_3_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_large_workspace_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv3d_depthwise_naive_groups_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_large_output_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose2d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_ConvTranspose3d_size_1_kernel_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_contig_wrong_stride_cudnn_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv1d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_no_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_backward_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv2d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_cudnn_broken_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_same_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_complex128, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_valid_padding_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_same_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv3d_vs_scipy_mode_valid_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cuda_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_cudnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_miopen_depthwise3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_cpu_input_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn3d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_batch_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_mkldnn_empty_channel3d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow1d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_dilated_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow2d_transposed_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cpu_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_cuda_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_False_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_False_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_False_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_backend_slow3d_dilated_has_bias_True_strided_True_contiguous_True_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_contiguous_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_mismatch_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_ndhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_bfloat16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_cudnn_nhwc_support_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_groups_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_no_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_stride_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_double_backward_strided_with_3D_input_and_weight_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_complex64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_empty_channel_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_ic1_channels_last_for_oneDNN_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_batch_1_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_large_nosplit_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_and_bias_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_noncontig_weights_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_thnn_nhwc_cuda_float64, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose2d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transpose_with_output_size_and_no_batch_dim_ConvTranspose3d_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_conv_transposed_large_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv2d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_convert_conv3d_weight_memory_format_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_add_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_cudnn_convolution_relu_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_depthwise_conv_64bit_indexing_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_convTranspose_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_group_conv_empty_cuda, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float16, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float32, test/nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_noncontig_conv_grad_cuda_float64
2025-12-04T16:12:11.8350007Z 
2025-12-04T16:12:11.8350380Z Finished nn/test_convolution 1/1 ... [2025-12-04 16:12:11.753100][25089.363006733], took 0.76min
2025-12-04T16:12:11.8351579Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.xml
2025-12-04T16:12:11.9335512Z Running test_indexing 1/1 ... [2025-12-04 16:12:11.933279][25089.543186008]
2025-12-04T16:12:11.9337817Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:12:11.9339534Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_indexing.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:11.933705]
2025-12-04T16:12:40.8373849Z 
2025-12-04T16:12:40.8374805Z test_indexing 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_indexing_1.1_2824065dc4dc1509_.log
2025-12-04T16:12:40.8443630Z Running 186 items in this shard: test/test_indexing.py::TestIndexingCUDA::test_advancedindex_big_cuda, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_advancedindex_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_basic_advanced_combined_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_accumulate_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_byte_tensor_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_cpu_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_cuda_broadcast_index_use_deterministic_algorithms_cuda, test/test_indexing.py::TestIndexingCUDA::test_ellipsis_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_bool_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_ndim_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_empty_slice_cuda, test/test_indexing.py::TestIndexingCUDA::test_errors_index_copy_cuda, test/test_indexing.py::TestIndexingCUDA::test_gather_take_along_dim_cross_device_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_getitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_add_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_copy_scalars_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_fill_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_getitem_copy_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_ind_dtype_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_limits_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_duplicate_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_empty_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_expanded_values_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_large_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_accumulate_non_contiguous_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_deterministic_with_optional_tensors_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_large_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_non_accumulate_deterministic_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_put_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amax_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_amin_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_mean_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_reduce_reduce_prod_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_scalar_with_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex128, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_complex64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fn, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e4m3fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_float8_e5m2fnuz, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int16, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int32, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_int8, test/test_indexing.py::TestIndexingCUDA::test_index_select_cuda_uint8, test/test_indexing.py::TestIndexingCUDA::test_index_setitem_bools_slices_cuda, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bfloat16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_bool, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_float16, test/test_indexing.py::TestIndexingCUDA::test_index_src_datatype_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_int_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices2d_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_broadcast_cuda, test/test_indexing.py::TestIndexingCUDA::test_int_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_device_cuda, test/test_indexing.py::TestIndexingCUDA::test_invalid_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_jit_indexing_cuda, test/test_indexing.py::TestIndexingCUDA::test_list_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_multi_dimensional_bool_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_bool_indices_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_byte_mask_cuda, test/test_indexing.py::TestIndexingCUDA::test_multiple_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_none_cuda, test/test_indexing.py::TestIndexingCUDA::test_out_of_bound_index_cuda, test/test_indexing.py::TestIndexingCUDA::test_set_item_to_scalar_tensor_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_expansion_error_cuda, test/test_indexing.py::TestIndexingCUDA::test_setitem_scalars_cuda, test/test_indexing.py::TestIndexingCUDA::test_single_int_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_assignment_cuda, test/test_indexing.py::TestIndexingCUDA::test_step_cuda, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_float32, test/test_indexing.py::TestIndexingCUDA::test_take_along_dim_invalid_cuda_int64, test/test_indexing.py::TestIndexingCUDA::test_unravel_index_errors_cuda, test/test_indexing.py::TestIndexingCUDA::test_variable_slicing_cuda, test/test_indexing.py::TestIndexingCUDA::test_zero_dim_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_assignment_value_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_alldims_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_onedim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_twodim_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_indexing_weirdness_tensors_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_list_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_boolean_shape_mismatch_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broadcast_subspace_cuda, test/test_indexing.py::NumpyTestsCUDA::test_broaderrors_indexing_cuda, test/test_indexing.py::NumpyTestsCUDA::test_ellipsis_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_fancy_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_empty_tuple_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_everything_returns_views_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_is_larger_cuda, test/test_indexing.py::NumpyTestsCUDA::test_index_no_floats_cuda, test/test_indexing.py::NumpyTestsCUDA::test_none_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_bool_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_single_int_index_cuda, test/test_indexing.py::NumpyTestsCUDA::test_trivial_fancy_out_of_bounds_cuda, test/test_indexing.py::NumpyTestsCUDA::test_truncate_leading_1s_cuda
2025-12-04T16:12:40.8511071Z 
2025-12-04T16:12:40.8511372Z Finished test_indexing 1/1 ... [2025-12-04 16:12:40.837460][25118.447367855], took 0.48min
2025-12-04T16:12:40.8799781Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.xml
2025-12-04T16:12:40.9790836Z Running torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 16:12:40.978776][25118.588683517]
2025-12-04T16:12:40.9791711Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:12:40.9794507Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/fft/test_pocketfft.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:40.979191]
2025-12-04T16:12:49.3062228Z 
2025-12-04T16:12:49.3063516Z torch_np/numpy_tests/fft/test_pocketfft 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_5bba81624a9a4669_.log
2025-12-04T16:12:49.3100387Z Running 79 items in this shard: test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTShift::test_fft_n, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_all_1d_norm_preserving, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_axes_op3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_dtypes_dtype2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype0_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype1_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype2_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_F_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft1, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft3, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft4, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fft_with_order_dtype3_order_non-contiguous_fft5, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_fftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_hfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_identity, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm0, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_backward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_forward, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifft_norm_ortho, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ifftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_ihfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_irfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfft2, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFT1D::test_rfftn, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_fft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_ifft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_irfft, test/torch_np/numpy_tests/fft/test_pocketfft.py::TestFFTThreadSafe::test_rfft
2025-12-04T16:12:49.3136941Z 
2025-12-04T16:12:49.3137376Z Finished torch_np/numpy_tests/fft/test_pocketfft 1/1 ... [2025-12-04 16:12:49.306159][25126.916065829], took 0.14min
2025-12-04T16:12:49.3493214Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.xml
2025-12-04T16:12:49.4339597Z Running torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 16:12:49.433621][25127.043529022]
2025-12-04T16:12:49.4340261Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:12:49.4343308Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'torch_np/numpy_tests/lib/test_shape_base_.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:49.434054]
2025-12-04T16:12:55.3567367Z 
2025-12-04T16:12:55.3568856Z torch_np/numpy_tests/lib/test_shape_base_ 1/1 was successful, full logs can be found in artifacts with path test/test-reports/torch_np.numpy_tests.lib.test_shape_base__1.1_462d874ba4c079f0_.log
2025-12-04T16:12:55.3599359Z Running 73 items in this shard: test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_argequivalent, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTakeAlongAxis::test_invalid, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_broadcast, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestPutAlongAxis::test_replace_max, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_0d_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_3d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_axis_insertion_ma, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_scalar_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_simple101, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_tuple_func1d, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyAlongAxis::test_with_iterable_object, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestApplyOverAxes::test_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_out_of_range, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_axis_tuple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_functionality, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestExpandDims::test_repeated_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_high_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_low_bound, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_index_split_simple, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_0_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_cols, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_default, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestArraySplit::test_integer_split_2D_rows_greater_max_int32, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_equal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSplit::test_unequal_split, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_1D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_2D_arrays, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestColumnStack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_2D_array2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_generator, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDstack::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestHsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestVsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_0D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_1D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_2D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_3D_array, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestDsplit::test_non_iterable, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_basic_2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_axis_handling, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_contiguous, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestSqueeze::test_squeeze_type, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a0_shape_b0, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a1_shape_b1, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a2_shape_b2, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a3_shape_b3, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a4_shape_b4, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestKron::test_kron_shape_shape_a5_shape_b5, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_basic, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_empty, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_kroncompare, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestTile::test_tile_one_repetition_on_array_gh4679, test/torch_np/numpy_tests/lib/test_shape_base_.py::TestMayShareMemory::test_basic
2025-12-04T16:12:55.3629254Z 
2025-12-04T16:12:55.3629676Z Finished torch_np/numpy_tests/lib/test_shape_base_ 1/1 ... [2025-12-04 16:12:55.356642][25132.9665502], took 0.10min
2025-12-04T16:12:55.4003272Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.xml
2025-12-04T16:12:55.4352201Z Running test_cpp_extensions_jit 1/1 ... [2025-12-04 16:12:55.434989][25133.044897475]
2025-12-04T16:12:55.4352765Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:12:55.4356070Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'test_cpp_extensions_jit.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:12:55.435369]
2025-12-04T16:16:42.9165179Z 
2025-12-04T16:16:42.9166228Z test_cpp_extensions_jit 1/1 was successful, full logs can be found in artifacts with path test/test-reports/test_cpp_extensions_jit_1.1_53eadff4adfe6cf3_.log
2025-12-04T16:16:42.9184857Z Running 35 items in this shard: test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_aoti_torch_call_dispatcher, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_autograd_from_cpp, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_compilation_error_formatting, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_same_output_as_python, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_has_up_to_date_attributes, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cpp_frontend_module_python_inter_op_with_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_arch_flags_non_default_gencode, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_cuda_pluggable_allocator_include, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_compound_op_autograd, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_custom_functorch_error, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_gen_extension_h_pch, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_half_support, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_custom_op_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_cuda, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_multiple_sources_and_no_functions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_throws_when_functions_is_bad, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_dict, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_with_functions_as_list, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_inline_jit_compile_extension_xpu, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_compile_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_archflags, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cuda_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_cudnn_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_archlists, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_jit_xpu_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_lenient_flag_handling_in_jit_extensions, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_load_with_non_platform_default_encoding, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_mps_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_pch_command_injection, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_reload_jit_extension, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_returns_shared_library_path_when_is_python_module_is_true, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_set_default_type_also_changes_aten_default_type, test/test_cpp_extensions_jit.py::TestCppExtensionJIT::test_warning
2025-12-04T16:16:42.9200791Z 
2025-12-04T16:16:42.9201316Z Finished test_cpp_extensions_jit 1/1 ... [2025-12-04 16:16:42.916366][25360.526274638], took 3.79min
2025-12-04T16:16:42.9601827Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.xml
2025-12-04T16:16:44.5222298Z Uploading artifacts took 1.49 seconds
2025-12-04T16:16:44.5226841Z Running profiler/test_python_tracer 1/1 ... [2025-12-04 16:16:44.522489][25362.132397219]
2025-12-04T16:16:44.5227475Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:16:44.5232234Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'profiler/test_python_tracer.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:16:44.522960]
2025-12-04T16:16:55.1018689Z 
2025-12-04T16:16:55.1019824Z profiler/test_python_tracer 1/1 was successful, full logs can be found in artifacts with path test/test-reports/profiler.test_python_tracer_1.1_2f036554f4a33837_.log
2025-12-04T16:16:55.1022013Z Running 3 items in this shard: test/profiler/test_python_tracer.py::TestPythonTracer::test_method_with_c_function, test/profiler/test_python_tracer.py::TestPythonTracer::test_monitoring_callback, test/profiler/test_python_tracer.py::TestPythonTracer::test_unexpected_c_return_events
2025-12-04T16:16:55.1023684Z 
2025-12-04T16:16:55.1024069Z Finished profiler/test_python_tracer 1/1 ... [2025-12-04 16:16:55.101665][25372.711574683], took 0.18min
2025-12-04T16:16:55.1450409Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.xml
2025-12-04T16:16:55.2549110Z Running cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 ... [2025-12-04 16:16:55.254550][25372.864459302]
2025-12-04T16:16:55.2549972Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:16:55.2552779Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:16:55.255005]
2025-12-04T16:17:22.4079934Z 
2025-12-04T16:17:22.4081469Z cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 was successful, full logs can be found in artifacts with path test/test-reports/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility_1.1_38e9912ded2d6880_.log
2025-12-04T16:17:22.4101625Z Running 23 items in this shard: test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_get_any_data_ptr_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_get_template_any_data_ptr_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_make_tensor_clones_and_call_foreach_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_mv_tensor_accessor_cpu_works_with_2_9, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_mv_tensor_accessor_cuda_works_with_2_9, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my__foreach_mul__requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my__foreach_mul_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_empty_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_reshape_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_shape_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_string_op_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_my_view_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_cublas_handle_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_cuda_stream_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_constructor_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_equality_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_index_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_is_cpu_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_is_cuda_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_device_set_index_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_get_num_threads_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_parallel_for_requires_2_10, test/cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility.py::FunctionVersionCompatibilityTest::test_test_tensor_device_requires_2_10
2025-12-04T16:17:22.4120314Z 
2025-12-04T16:17:22.4120929Z Finished cpp_extensions/libtorch_agnostic_2_10_extension/test_version_compatibility 1/1 ... [2025-12-04 16:17:22.407791][25400.017699517], took 0.45min
2025-12-04T16:17:22.4517704Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.xml
2025-12-04T16:17:22.5423640Z Running distributions/test_distributions 1/1 ... [2025-12-04 16:17:22.542076][25400.151984717]
2025-12-04T16:17:22.5424273Z SCRIBE_GRAPHQL_ACCESS_TOKEN is set
2025-12-04T16:17:22.5427429Z Executing ['/opt/conda/envs/py_3.10/bin/python', '-bb', 'distributions/test_distributions.py', '--shard-id=1', '--num-shards=1', '-v', '-vv', '-rfEX', '-p', 'no:xdist', '--use-pytest', '-x', '--reruns=2', '--import-slow-tests', '--import-disabled-tests'] ... [2025-12-04 16:17:22.542480]
2025-12-04T16:18:40.2117689Z 
2025-12-04T16:18:40.2120765Z distributions/test_distributions 1/1 was successful, full logs can be found in artifacts with path test/test-reports/distributions.test_distributions_1.1_10129d86baeaadf5_.log
2025-12-04T16:18:40.2228680Z Running 230 items in this shard: test/distributions/test_distributions.py::TestDistributions::test_argmax_relaxed_categorical, test/distributions/test_distributions.py::TestDistributions::test_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_bernoulli_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_beta_log_prob, test/distributions/test_distributions.py::TestDistributions::test_beta_sample, test/distributions/test_distributions.py::TestDistributions::test_beta_shape, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow, test/distributions/test_distributions.py::TestDistributions::test_beta_underflow_gpu, test/distributions/test_distributions.py::TestDistributions::test_binomial, test/distributions/test_distributions.py::TestDistributions::test_binomial_bfloat16, test/distributions/test_distributions.py::TestDistributions::test_binomial_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_binomial_extreme_vals, test/distributions/test_distributions.py::TestDistributions::test_binomial_half, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_binomial_sample, test/distributions/test_distributions.py::TestDistributions::test_binomial_stable, test/distributions/test_distributions.py::TestDistributions::test_binomial_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_cauchy, test/distributions/test_distributions.py::TestDistributions::test_cdf_icdf_inverse, test/distributions/test_distributions.py::TestDistributions::test_cdf_log_prob, test/distributions/test_distributions.py::TestDistributions::test_chi2_sample, test/distributions/test_distributions.py::TestDistributions::test_chi2_shape, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_continuous_bernoulli_3d, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_log_prob_zero, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_mode, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_sample, test/distributions/test_distributions.py::TestDistributions::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributions::test_distribution_expand, test/distributions/test_distributions.py::TestDistributions::test_distribution_subclass_expand, test/distributions/test_distributions.py::TestDistributions::test_enumerate_support_type, test/distributions/test_distributions.py::TestDistributions::test_exponential, test/distributions/test_distributions.py::TestDistributions::test_exponential_sample, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor, test/distributions/test_distributions.py::TestDistributions::test_fishersnedecor_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_gpu_shape, test/distributions/test_distributions.py::TestDistributions::test_gamma_log_prob_at_boundary, test/distributions/test_distributions.py::TestDistributions::test_gamma_sample, test/distributions/test_distributions.py::TestDistributions::test_gamma_shape, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto, test/distributions/test_distributions.py::TestDistributions::test_generalized_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_geometric, test/distributions/test_distributions.py::TestDistributions::test_geometric_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_geometric_sample, test/distributions/test_distributions.py::TestDistributions::test_gumbel, test/distributions/test_distributions.py::TestDistributions::test_gumbel_sample, test/distributions/test_distributions.py::TestDistributions::test_halfcauchy, test/distributions/test_distributions.py::TestDistributions::test_halfnormal, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_halfnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_has_examples, test/distributions/test_distributions.py::TestDistributions::test_independent_expand, test/distributions/test_distributions.py::TestDistributions::test_independent_shape, test/distributions/test_distributions.py::TestDistributions::test_invalid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_inversegamma, test/distributions/test_distributions.py::TestDistributions::test_inversegamma_sample, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_mean_variance, test/distributions/test_distributions.py::TestDistributions::test_kumaraswamy_shape, test/distributions/test_distributions.py::TestDistributions::test_laplace, test/distributions/test_distributions.py::TestDistributions::test_laplace_sample, test/distributions/test_distributions.py::TestDistributions::test_lazy_property_grad, test/distributions/test_distributions.py::TestDistributions::test_lkj_cholesky_log_prob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_logisticnormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lognormal, test/distributions/test_distributions.py::TestDistributions::test_lognormal_logprob, test/distributions/test_distributions.py::TestDistributions::test_lognormal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_lowrank_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_sample, test/distributions/test_distributions.py::TestDistributions::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributions::test_mode, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_1d_log_prob_and_entropy, test/distributions/test_distributions.py::TestDistributions::test_multinomial_2d, test/distributions/test_distributions.py::TestDistributions::test_multinomial_sequential_draw, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_log_prob, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_moments, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_properties, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_shape, test/distributions/test_distributions.py::TestDistributions::test_multivariate_normal_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob, test/distributions/test_distributions.py::TestDistributions::test_negative_binomial_log_prob_vectorized_count, test/distributions/test_distributions.py::TestDistributions::test_normal, test/distributions/test_distributions.py::TestDistributions::test_normal_sample, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_one_hot_categorical_enumerate_support, test/distributions/test_distributions.py::TestDistributions::test_pareto, test/distributions/test_distributions.py::TestDistributions::test_pareto_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_forward_ad, test/distributions/test_distributions.py::TestDistributions::test_poisson_gpu_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_log_prob, test/distributions/test_distributions.py::TestDistributions::test_poisson_sample, test/distributions/test_distributions.py::TestDistributions::test_poisson_shape, test/distributions/test_distributions.py::TestDistributions::test_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_1d, test/distributions/test_distributions.py::TestDistributions::test_relaxed_one_hot_categorical_2d, test/distributions/test_distributions.py::TestDistributions::test_repr, test/distributions/test_distributions.py::TestDistributions::test_rounded_relaxed_bernoulli, test/distributions/test_distributions.py::TestDistributions::test_rsample_requires_grad, test/distributions/test_distributions.py::TestDistributions::test_sample_detached, test/distributions/test_distributions.py::TestDistributions::test_studentT, test/distributions/test_distributions.py::TestDistributions::test_studentT_log_prob, test/distributions/test_distributions.py::TestDistributions::test_studentT_sample, test/distributions/test_distributions.py::TestDistributions::test_support_attributes, test/distributions/test_distributions.py::TestDistributions::test_torch_binomial_dtype_errors, test/distributions/test_distributions.py::TestDistributions::test_uniform, test/distributions/test_distributions.py::TestDistributions::test_valid_parameter_broadcasting, test/distributions/test_distributions.py::TestDistributions::test_vonmises_logprob, test/distributions/test_distributions.py::TestDistributions::test_vonmises_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_log_prob, test/distributions/test_distributions.py::TestDistributions::test_wishart_moments, test/distributions/test_distributions.py::TestDistributions::test_wishart_properties, test/distributions/test_distributions.py::TestDistributions::test_wishart_sample, test/distributions/test_distributions.py::TestDistributions::test_wishart_shape, test/distributions/test_distributions.py::TestDistributions::test_wishart_stable_with_precision_matrix, test/distributions/test_distributions.py::TestDistributions::test_zero_excluded_binomial, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_alpha, test/distributions/test_distributions.py::TestRsample::test_beta_wrt_beta, test/distributions/test_distributions.py::TestRsample::test_chi2, test/distributions/test_distributions.py::TestRsample::test_dirichlet_multivariate, test/distributions/test_distributions.py::TestRsample::test_dirichlet_on_diagonal, test/distributions/test_distributions.py::TestRsample::test_dirichlet_tangent_field, test/distributions/test_distributions.py::TestRsample::test_gamma, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_beta_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_binomial_shape_vectorized_n, test/distributions/test_distributions.py::TestDistributionShapes::test_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_cauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_chi2_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_continuous_bernoulli_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_dirichlet_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_entropy_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_scalar_param, test/distributions/test_distributions.py::TestDistributionShapes::test_exponential_shape_tensor_param, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gamma_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_geometric_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_gumbel_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_halfcauchy_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_kumaraswamy_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_laplace_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_mean_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_mixture_same_family_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_multinomial_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_normal_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_one_hot_categorical_shape, test/distributions/test_distributions.py::TestDistributionShapes::test_pareto_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_studentT_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_uniform_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_vonmises_shape_tensor_params, test/distributions/test_distributions.py::TestDistributionShapes::test_weibull_scale_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_scalar_params, test/distributions/test_distributions.py::TestDistributionShapes::test_wishart_shape_tensor_params, test/distributions/test_distributions.py::TestKL::test_entropy_exponential_family, test/distributions/test_distributions.py::TestKL::test_entropy_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_edgecases, test/distributions/test_distributions.py::TestKL::test_kl_exponential_family, test/distributions/test_distributions.py::TestKL::test_kl_infinite, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_lowrank_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_monte_carlo, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched, test/distributions/test_distributions.py::TestKL::test_kl_multivariate_normal_batched_broadcasted, test/distributions/test_distributions.py::TestKL::test_kl_shape, test/distributions/test_distributions.py::TestKL::test_kl_transformed, test/distributions/test_distributions.py::TestConstraints::test_params_constraints, test/distributions/test_distributions.py::TestConstraints::test_support_constraints, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_categorical_log_prob_with_logits, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_gradient, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_overflow, test/distributions/test_distributions.py::TestNumericalStability::test_continuous_bernoulli_with_logits_underflow, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob, test/distributions/test_distributions.py::TestNumericalStability::test_multinomial_log_prob_with_logits, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_logits_initialization, test/distributions/test_distributions.py::TestLazyLogitsInitialization::test_lazy_probs_initialization, test/distributions/test_distributions.py::TestAgainstScipy::test_cdf, test/distributions/test_distributions.py::TestAgainstScipy::test_icdf, test/distributions/test_distributions.py::TestAgainstScipy::test_mean, test/distributions/test_distributions.py::TestAgainstScipy::test_variance_stddev, test/distributions/test_distributions.py::TestFunctors::test_cat_event_dim, test/distributions/test_distributions.py::TestFunctors::test_cat_transform, test/distributions/test_distributions.py::TestFunctors::test_cat_transform_non_uniform, test/distributions/test_distributions.py::TestFunctors::test_stack_transform, test/distributions/test_distributions.py::TestValidation::test_invalid, test/distributions/test_distributions.py::TestValidation::test_invalid_log_probs_arg, test/distributions/test_distributions.py::TestValidation::test_valid, test/distributions/test_distributions.py::TestValidation::test_warning_unimplemented_constraints, test/distributions/test_distributions.py::TestJit::test_cdf, test/distributions/test_distributions.py::TestJit::test_entropy, test/distributions/test_distributions.py::TestJit::test_enumerate_support, test/distributions/test_distributions.py::TestJit::test_log_prob, test/distributions/test_distributions.py::TestJit::test_mean, test/distributions/test_distributions.py::TestJit::test_rsample, test/distributions/test_distributions.py::TestJit::test_sample, test/distributions/test_distributions.py::TestJit::test_variance
2025-12-04T16:18:40.2335399Z 
2025-12-04T16:18:40.2335815Z Finished distributions/test_distributions 1/1 ... [2025-12-04 16:18:40.211950][25477.821855577], took 1.29min
2025-12-04T16:18:40.2563004Z Parsing testcases for test report: /var/lib/jenkins/workspace/test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.xml
2025-12-04T16:18:47.2843273Z Running test batch 'tests to run' cost 23636.91 seconds
2025-12-04T16:18:47.2859440Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.2863489Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9637688d12c11f0bad30242ac110002
2025-12-04T16:18:47.3909731Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9637688d12c11f0bad30242ac110002 
2025-12-04T16:18:47.3925108Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.3928166Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e973b566d12c11f0bad30242ac110002
2025-12-04T16:18:47.4497742Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e973b566d12c11f0bad30242ac110002 
2025-12-04T16:18:47.4513717Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.4516374Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e97cafccd12c11f0bad30242ac110002
2025-12-04T16:18:47.4885409Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e97cafccd12c11f0bad30242ac110002 
2025-12-04T16:18:47.4900628Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.4903727Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e982982ed12c11f0bad30242ac110002
2025-12-04T16:18:47.5285845Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e982982ed12c11f0bad30242ac110002 
2025-12-04T16:18:47.5301579Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.5303976Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e988b416d12c11f0bad30242ac110002
2025-12-04T16:18:47.5646659Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e988b416d12c11f0bad30242ac110002 
2025-12-04T16:18:47.5661550Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.5663930Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e98e3292d12c11f0bad30242ac110002
2025-12-04T16:18:47.6403423Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e98e3292d12c11f0bad30242ac110002 
2025-12-04T16:18:47.6418491Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.6420749Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e999bf36d12c11f0bad30242ac110002
2025-12-04T16:18:47.6962786Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e999bf36d12c11f0bad30242ac110002 
2025-12-04T16:18:47.6978642Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.6980889Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a24ac0d12c11f0bad30242ac110002
2025-12-04T16:18:47.7304676Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a24ac0d12c11f0bad30242ac110002 
2025-12-04T16:18:47.7320133Z Emitting td_test_failure_stats_v2
2025-12-04T16:18:47.7322236Z Writing 1 documents to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a7806cd12c11f0bad30242ac110002
2025-12-04T16:18:47.8066677Z Done! Finish writing document to S3 ossci-raw-job-status/ossci_uploaded_metrics/td_test_failure_stats_v2_1764865127_e9a7806cd12c11f0bad30242ac110002 
2025-12-04T16:18:47.8068174Z inductor/test_aot_inductor 4/6 failed!
2025-12-04T16:18:47.8068580Z inductor/test_kernel_benchmark 1/1 failed!
2025-12-04T16:18:47.8069087Z inductor/test_pattern_matcher 1/1 failed!
2025-12-04T16:18:47.8069469Z inductor/test_cuda_repro 1/1 failed!
2025-12-04T16:18:47.8069870Z inductor/test_cuda_select_algorithm 4/5 failed!
2025-12-04T16:18:47.8070290Z inductor/test_native_matmul 1/2 failed!
2025-12-04T16:18:47.8070647Z inductor/test_memory 1/1 failed!
2025-12-04T16:18:47.8071006Z inductor/test_unbacked_symints 1/1 failed!
2025-12-04T16:18:47.8071425Z inductor/test_mix_order_reduction 1/2 failed!
2025-12-04T16:18:48.5790468Z 
2025-12-04T16:18:48.5791024Z real	394m5.349s
2025-12-04T16:18:48.5791554Z user	405m53.780s
2025-12-04T16:18:48.5791831Z sys	54m54.899s
2025-12-04T16:18:48.5792074Z + sccache_epilogue
2025-12-04T16:18:48.5792399Z + echo '::group::Sccache Compilation Log'
2025-12-04T16:18:48.5793107Z ##[group]Sccache Compilation Log
2025-12-04T16:18:48.5793546Z + echo '=================== sccache compilation log ==================='
2025-12-04T16:18:48.5794016Z =================== sccache compilation log ===================
2025-12-04T16:18:48.5794767Z + python /var/lib/jenkins/workspace/.ci/pytorch/print_sccache_log.py /var/lib/jenkins/sccache_error.log
2025-12-04T16:18:48.5941586Z + echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='
2025-12-04T16:18:48.5942619Z =========== If your build fails, please take a look at the log above for possible reasons ===========
2025-12-04T16:18:48.5943191Z + sccache --show-stats
2025-12-04T16:18:48.5976994Z Compile requests                   5079
2025-12-04T16:18:48.5977405Z Compile requests executed           265
2025-12-04T16:18:48.5977754Z Cache hits                          116
2025-12-04T16:18:48.5978109Z Cache hits (C/C++)                  116
2025-12-04T16:18:48.5978457Z Cache misses                        129
2025-12-04T16:18:48.5979217Z Cache misses (C/C++)                129
2025-12-04T16:18:48.5979791Z Cache hits rate                   47.35 %
2025-12-04T16:18:48.5980404Z Cache hits rate (C/C++)           47.35 %
2025-12-04T16:18:48.5981156Z Cache timeouts                        0
2025-12-04T16:18:48.5981802Z Cache read errors                     0
2025-12-04T16:18:48.5982159Z Forced recaches                       0
2025-12-04T16:18:48.5982521Z Cache write errors                    0
2025-12-04T16:18:48.5982892Z Cache errors                          0
2025-12-04T16:18:48.5983233Z Compilations                        129
2025-12-04T16:18:48.5983605Z Compilation failures                 20
2025-12-04T16:18:48.5983986Z Non-cacheable compilations            0
2025-12-04T16:18:48.5984347Z Non-cacheable calls                 145
2025-12-04T16:18:48.5984719Z Non-compilation calls              4669
2025-12-04T16:18:48.5985092Z Unsupported compiler calls            0
2025-12-04T16:18:48.5985453Z Average cache write               0.048 s
2025-12-04T16:18:48.5985831Z Average compiler                  8.181 s
2025-12-04T16:18:48.5986203Z Average cache read hit            0.049 s
2025-12-04T16:18:48.5986568Z Failed distributed compilations       0
2025-12-04T16:18:48.5986834Z 
2025-12-04T16:18:48.5986949Z Non-cacheable reasons:
2025-12-04T16:18:48.5987250Z unknown source language              80
2025-12-04T16:18:48.5987607Z -E                                   65
2025-12-04T16:18:48.5987861Z 
2025-12-04T16:18:48.5988129Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T16:18:48.5988663Z Version (client)                0.10.0
2025-12-04T16:18:48.5989175Z + sccache --stop-server
2025-12-04T16:18:48.6004194Z Stopping sccache server...
2025-12-04T16:18:48.6006595Z Compile requests                   5079
2025-12-04T16:18:48.6006990Z Compile requests executed           265
2025-12-04T16:18:48.6007345Z Cache hits                          116
2025-12-04T16:18:48.6007701Z Cache hits (C/C++)                  116
2025-12-04T16:18:48.6008073Z Cache misses                        129
2025-12-04T16:18:48.6008418Z Cache misses (C/C++)                129
2025-12-04T16:18:48.6008787Z Cache hits rate                   47.35 %
2025-12-04T16:18:48.6009172Z Cache hits rate (C/C++)           47.35 %
2025-12-04T16:18:48.6009579Z Cache timeouts                        0
2025-12-04T16:18:48.6010066Z Cache read errors                     0
2025-12-04T16:18:48.6010655Z Forced recaches                       0
2025-12-04T16:18:48.6011126Z Cache write errors                    0
2025-12-04T16:18:48.6011471Z Cache errors                          0
2025-12-04T16:18:48.6011830Z Compilations                        129
2025-12-04T16:18:48.6012214Z Compilation failures                 20
2025-12-04T16:18:48.6012582Z Non-cacheable compilations            0
2025-12-04T16:18:48.6012958Z Non-cacheable calls                 145
2025-12-04T16:18:48.6013323Z Non-compilation calls              4669
2025-12-04T16:18:48.6013701Z Unsupported compiler calls            0
2025-12-04T16:18:48.6014065Z Average cache write               0.048 s
2025-12-04T16:18:48.6014444Z Average compiler                  8.181 s
2025-12-04T16:18:48.6014817Z Average cache read hit            0.049 s
2025-12-04T16:18:48.6015190Z Failed distributed compilations       0
2025-12-04T16:18:48.6015460Z 
2025-12-04T16:18:48.6015568Z Non-cacheable reasons:
2025-12-04T16:18:48.6015874Z unknown source language              80
2025-12-04T16:18:48.6016216Z -E                                   65
2025-12-04T16:18:48.6016461Z 
2025-12-04T16:18:48.6016727Z Cache location                  s3, name: ossci-compiler-cache-circleci-v2, prefix: /
2025-12-04T16:18:48.6017257Z Version (client)                0.10.0
2025-12-04T16:18:48.6017622Z + echo ::endgroup::
2025-12-04T16:18:48.6018164Z ##[endgroup]
2025-12-04T16:18:48.6018403Z + cleanup_workspace
2025-12-04T16:18:48.6018987Z + echo 'sudo may print the following warning message that can be ignored. The chown command will still run.'
2025-12-04T16:18:48.6020078Z sudo may print the following warning message that can be ignored. The chown command will still run.
2025-12-04T16:18:48.6020806Z + echo '    sudo: setrlimit(RLIMIT_STACK): Operation not permitted'
2025-12-04T16:18:48.6021412Z     sudo: setrlimit(RLIMIT_STACK): Operation not permitted
2025-12-04T16:18:48.6022102Z + echo 'For more details refer to https://github.com/sudo-project/sudo/issues/42'
2025-12-04T16:18:48.6022801Z For more details refer to https://github.com/sudo-project/sudo/issues/42
2025-12-04T16:18:48.6023342Z + sudo chown -R 1000 /var/lib/jenkins/workspace
2025-12-04T16:18:49.3825717Z ##[error]Process completed with exit code 1.
2025-12-04T16:18:49.3914553Z Prepare all required actions
2025-12-04T16:18:49.3915019Z Getting action download info
2025-12-04T16:18:49.5981821Z ##[group]Run ./.github/actions/pytest-cache-upload
2025-12-04T16:18:49.5982228Z with:
2025-12-04T16:18:49.5982485Z   cache_dir: .pytest_cache
2025-12-04T16:18:49.5982786Z   shard: 4
2025-12-04T16:18:49.5983082Z   sha: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T16:18:49.5983492Z   test_config: legacy_nvidia_driver
2025-12-04T16:18:49.5983937Z   job_identifier: periodic_linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T16:18:49.5984380Z env:
2025-12-04T16:18:49.5984603Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:49.5984915Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:49.5985282Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:49.5985920Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:49.5986497Z ##[endgroup]
2025-12-04T16:18:49.6024647Z ##[group]Run nick-fields/retry@v3.0.0
2025-12-04T16:18:49.6025063Z with:
2025-12-04T16:18:49.6025288Z   shell: bash
2025-12-04T16:18:49.6025542Z   timeout_minutes: 5
2025-12-04T16:18:49.6025819Z   max_attempts: 5
2025-12-04T16:18:49.6026080Z   retry_wait_seconds: 30
2025-12-04T16:18:49.6026477Z   command: set -eu
python3 -m pip install boto3==1.35.42

2025-12-04T16:18:49.6026924Z   polling_interval_seconds: 1
2025-12-04T16:18:49.6027256Z   warning_on_retry: true
2025-12-04T16:18:49.6027543Z   continue_on_error: false
2025-12-04T16:18:49.6027839Z env:
2025-12-04T16:18:49.6028079Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:49.6028369Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:49.6028735Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:49.6029383Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:49.6029949Z ##[endgroup]
2025-12-04T16:18:49.9990819Z Defaulting to user installation because normal site-packages is not writeable
2025-12-04T16:18:51.2908527Z Collecting boto3==1.35.42
2025-12-04T16:18:51.3110175Z   Downloading boto3-1.35.42-py3-none-any.whl (139 kB)
2025-12-04T16:18:51.4038269Z Collecting s3transfer<0.11.0,>=0.10.0
2025-12-04T16:18:51.4085492Z   Downloading s3transfer-0.10.4-py3-none-any.whl (83 kB)
2025-12-04T16:18:52.7737998Z Collecting botocore<1.36.0,>=1.35.42
2025-12-04T16:18:52.7796126Z   Downloading botocore-1.35.99-py3-none-any.whl (13.3 MB)
2025-12-04T16:18:52.9416840Z Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /usr/lib/python3.9/site-packages (from boto3==1.35.42) (0.10.0)
2025-12-04T16:18:52.9487178Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.25.10)
2025-12-04T16:18:52.9492160Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/lib/python3.9/site-packages (from botocore<1.36.0,>=1.35.42->boto3==1.35.42) (2.8.1)
2025-12-04T16:18:53.1906517Z Requirement already satisfied: six>=1.5 in /usr/lib/python3.9/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.36.0,>=1.35.42->boto3==1.35.42) (1.15.0)
2025-12-04T16:18:53.2962599Z Installing collected packages: botocore, s3transfer, boto3
2025-12-04T16:18:53.9442925Z Successfully installed boto3-1.35.42 botocore-1.35.99 s3transfer-0.10.4
2025-12-04T16:18:54.6923608Z Command completed after 1 attempt(s).
2025-12-04T16:18:54.6979260Z ##[group]Run python3 .github/scripts/pytest_cache.py \
2025-12-04T16:18:54.6979893Z [36;1mpython3 .github/scripts/pytest_cache.py \[0m
2025-12-04T16:18:54.6980315Z [36;1m  --upload \[0m
2025-12-04T16:18:54.6980679Z [36;1m  --cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \[0m
2025-12-04T16:18:54.6981192Z [36;1m  --pr_identifier "$GITHUB_REF" \[0m
2025-12-04T16:18:54.6981635Z [36;1m  --job_identifier "$JOB_IDENTIFIER" \[0m
2025-12-04T16:18:54.6982027Z [36;1m  --sha "$SHA" \[0m
2025-12-04T16:18:54.6982348Z [36;1m  --test_config "$TEST_CONFIG" \[0m
2025-12-04T16:18:54.6982724Z [36;1m  --shard "$SHARD" \[0m
2025-12-04T16:18:54.6983296Z [36;1m  --repo "$REPO" \[0m
2025-12-04T16:18:54.6983625Z [36;1m  --temp_dir "$RUNNER_TEMP" \[0m
2025-12-04T16:18:54.6994434Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:54.6994897Z env:
2025-12-04T16:18:54.6995150Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:54.6995450Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:54.6995814Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:54.6996476Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:54.6997071Z   CACHE_DIR: .pytest_cache
2025-12-04T16:18:54.6997473Z   JOB_IDENTIFIER: periodic_linux-jammy-cuda12.4-py3.10-gcc11
2025-12-04T16:18:54.6997977Z   SHA: ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32
2025-12-04T16:18:54.6998385Z   TEST_CONFIG: legacy_nvidia_driver
2025-12-04T16:18:54.6998709Z   SHARD: 4
2025-12-04T16:18:54.6998956Z   REPO: pytorch/pytorch
2025-12-04T16:18:54.6999243Z ##[endgroup]
2025-12-04T16:18:55.2051285Z PR identifier for `refs/heads/main` is `96e092540d6b3c4076e3d2bc6f1f9013`
2025-12-04T16:18:55.2053735Z Uploading cache with args Namespace(upload=True, download=False, cache_dir='/home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache', pr_identifier='refs/heads/main', job_identifier='periodic_linux-jammy-cuda12.4-py3.10-gcc11', sha='ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32', test_config='legacy_nvidia_driver', shard='4', repo='pytorch/pytorch', temp_dir='/home/ec2-user/actions-runner/_work/_temp', bucket=None)
2025-12-04T16:18:55.2056135Z Zipping /home/ec2-user/actions-runner/_work/pytorch/pytorch/.pytest_cache
2025-12-04T16:18:55.2057701Z      to /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4
2025-12-04T16:18:55.2060206Z Uploading /home/ec2-user/actions-runner/_work/_temp/zip-upload/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4.zip
2025-12-04T16:18:55.2062528Z        to s3://gha-artifacts/pytest_cache/pytorch/pytorch/96e092540d6b3c4076e3d2bc6f1f9013/periodic_linux-jammy-cuda12_4-py3_10-gcc11/ffd9b0fb4355e97af82fc42cf185c3ffa0fc0a32/legacy_nvidia_driver/4.zip
2025-12-04T16:18:55.2627102Z ##[group]Run cat test/**/*_toprint.log || true
2025-12-04T16:18:55.2627579Z [36;1mcat test/**/*_toprint.log || true[0m
2025-12-04T16:18:55.2634290Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:55.2634733Z env:
2025-12-04T16:18:55.2634985Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:55.2635293Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:55.2635657Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:55.2636315Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:55.2636902Z ##[endgroup]
2025-12-04T16:18:55.2739659Z cat: 'test/**/*_toprint.log': No such file or directory
2025-12-04T16:18:55.2769638Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2025-12-04T16:18:55.2770069Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2025-12-04T16:18:55.2776305Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:55.2776758Z env:
2025-12-04T16:18:55.2777008Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:55.2777324Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:55.2777783Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:55.2778443Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:55.2779046Z   MONITOR_SCRIPT_PID: 68866
2025-12-04T16:18:55.2779411Z ##[endgroup]
2025-12-04T16:18:55.2805970Z /home/ec2-user/actions-runner/_work/_temp/580e4b3c-c61e-4546-b3f6-8e607cf3807e.sh: line 1: kill: (68866) - No such process
2025-12-04T16:18:55.2808095Z ##[error]Process completed with exit code 1.
2025-12-04T16:18:55.2949341Z Prepare all required actions
2025-12-04T16:18:55.2949835Z Getting action download info
2025-12-04T16:18:55.5256258Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2025-12-04T16:18:55.7809955Z Download action repository 'actions/upload-artifact@v4' (SHA:ea165f8d65b6e75b540449e92b4886f43607fa02)
2025-12-04T16:18:56.3065454Z ##[group]Run ./.github/actions/upload-test-artifacts
2025-12-04T16:18:56.3065880Z with:
2025-12-04T16:18:56.3066360Z   file-suffix: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T16:18:56.3066959Z   s3-bucket: gha-artifacts
2025-12-04T16:18:56.3067256Z env:
2025-12-04T16:18:56.3067483Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.3067801Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.3068164Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.3068799Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.3069426Z ##[endgroup]
2025-12-04T16:18:56.3095932Z ##[group]Run # Remove any previous test jsons if they exist
2025-12-04T16:18:56.3096459Z [36;1m# Remove any previous test jsons if they exist[0m
2025-12-04T16:18:56.3096903Z [36;1mrm -f test-jsons-*.zip[0m
2025-12-04T16:18:56.3097408Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test/test-reports -i '*.json'[0m
2025-12-04T16:18:56.3104465Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:56.3104909Z env:
2025-12-04T16:18:56.3105159Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.3105473Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.3105826Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.3106479Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.3107292Z   FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T16:18:56.3107863Z ##[endgroup]
2025-12-04T16:18:56.3329005Z   adding: test/test-reports/td_exclusions-a20106558b25723d42f9.json (deflated 82%)
2025-12-04T16:18:56.3335390Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.json (deflated 93%)
2025-12-04T16:18:56.3337375Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.json (deflated 91%)
2025-12-04T16:18:56.3339532Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.json (deflated 91%)
2025-12-04T16:18:56.3344031Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.json (deflated 94%)
2025-12-04T16:18:56.3374267Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.json (deflated 93%)
2025-12-04T16:18:56.3399745Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.json (deflated 94%)
2025-12-04T16:18:56.3403579Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.json (deflated 89%)
2025-12-04T16:18:56.3405655Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.json (deflated 90%)
2025-12-04T16:18:56.3407818Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.json (deflated 90%)
2025-12-04T16:18:56.3409402Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.json (deflated 76%)
2025-12-04T16:18:56.3414935Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.json (deflated 96%)
2025-12-04T16:18:56.3419979Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.json (deflated 95%)
2025-12-04T16:18:56.3425713Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.json (deflated 95%)
2025-12-04T16:18:56.3430484Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.json (deflated 90%)
2025-12-04T16:18:56.3432395Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.json (deflated 90%)
2025-12-04T16:18:56.3434352Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.json (deflated 90%)
2025-12-04T16:18:56.3435823Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.json (deflated 90%)
2025-12-04T16:18:56.3437605Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.json (deflated 87%)
2025-12-04T16:18:56.3438879Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.json (deflated 81%)
2025-12-04T16:18:56.3440140Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.json (deflated 81%)
2025-12-04T16:18:56.3442964Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.json (deflated 92%)
2025-12-04T16:18:56.3457377Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.json (deflated 95%)
2025-12-04T16:18:56.3471784Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.json (deflated 95%)
2025-12-04T16:18:56.3476416Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.json (deflated 91%)
2025-12-04T16:18:56.3485885Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.json (deflated 97%)
2025-12-04T16:18:56.3495219Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.json (deflated 97%)
2025-12-04T16:18:56.3497787Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.json (deflated 91%)
2025-12-04T16:18:56.3510210Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.json (deflated 92%)
2025-12-04T16:18:56.3511585Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.json (deflated 57%)
2025-12-04T16:18:56.3517955Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.json (deflated 92%)
2025-12-04T16:18:56.3519399Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.json (deflated 85%)
2025-12-04T16:18:56.3520881Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.json (deflated 85%)
2025-12-04T16:18:56.3522433Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.json (deflated 85%)
2025-12-04T16:18:56.3524028Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.json (deflated 85%)
2025-12-04T16:18:56.3525525Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.json (deflated 85%)
2025-12-04T16:18:56.3527183Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.json (deflated 85%)
2025-12-04T16:18:56.3528664Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.json (deflated 86%)
2025-12-04T16:18:56.3530152Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.json (deflated 85%)
2025-12-04T16:18:56.3531643Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.json (deflated 85%)
2025-12-04T16:18:56.3533134Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.json (deflated 86%)
2025-12-04T16:18:56.3534620Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.json (deflated 85%)
2025-12-04T16:18:56.3536094Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.json (deflated 85%)
2025-12-04T16:18:56.3537581Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.json (deflated 85%)
2025-12-04T16:18:56.3539069Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.json (deflated 85%)
2025-12-04T16:18:56.3540554Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.json (deflated 85%)
2025-12-04T16:18:56.3542048Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.json (deflated 85%)
2025-12-04T16:18:56.3543521Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.json (deflated 85%)
2025-12-04T16:18:56.3545012Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.json (deflated 85%)
2025-12-04T16:18:56.3546493Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.json (deflated 86%)
2025-12-04T16:18:56.3547972Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.json (deflated 85%)
2025-12-04T16:18:56.3549450Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.json (deflated 85%)
2025-12-04T16:18:56.3550927Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.json (deflated 85%)
2025-12-04T16:18:56.3552414Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.json (deflated 85%)
2025-12-04T16:18:56.3553895Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.json (deflated 85%)
2025-12-04T16:18:56.3555376Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.json (deflated 85%)
2025-12-04T16:18:56.3556840Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.json (deflated 85%)
2025-12-04T16:18:56.3558361Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.json (deflated 85%)
2025-12-04T16:18:56.3559835Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.json (deflated 85%)
2025-12-04T16:18:56.3561411Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.json (deflated 85%)
2025-12-04T16:18:56.3562958Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.json (deflated 85%)
2025-12-04T16:18:56.3564426Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.json (deflated 85%)
2025-12-04T16:18:56.3565911Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.json (deflated 85%)
2025-12-04T16:18:56.3567401Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.json (deflated 85%)
2025-12-04T16:18:56.3568880Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.json (stored 0%)
2025-12-04T16:18:56.3570284Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.json (deflated 79%)
2025-12-04T16:18:56.3571644Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.json (deflated 65%)
2025-12-04T16:18:56.3573036Z   adding: test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.json (deflated 59%)
2025-12-04T16:18:56.3574402Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.json (deflated 86%)
2025-12-04T16:18:56.3575706Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.json (deflated 82%)
2025-12-04T16:18:56.3577023Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.json (deflated 82%)
2025-12-04T16:18:56.3578340Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.json (deflated 82%)
2025-12-04T16:18:56.3579671Z   adding: test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.json (deflated 89%)
2025-12-04T16:18:56.3580931Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.json (deflated 92%)
2025-12-04T16:18:56.3582106Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.json (deflated 93%)
2025-12-04T16:18:56.3583287Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.json (deflated 93%)
2025-12-04T16:18:56.3584464Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.json (deflated 85%)
2025-12-04T16:18:56.3585628Z   adding: test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.json (deflated 88%)
2025-12-04T16:18:56.3586886Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.json (deflated 91%)
2025-12-04T16:18:56.3596303Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.json (deflated 96%)
2025-12-04T16:18:56.3609522Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.json (deflated 96%)
2025-12-04T16:18:56.3610941Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.json (deflated 87%)
2025-12-04T16:18:56.3624001Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.json (deflated 96%)
2025-12-04T16:18:56.3636878Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.json (deflated 96%)
2025-12-04T16:18:56.3638397Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.json (deflated 89%)
2025-12-04T16:18:56.3639854Z   adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.json (deflated 84%)
2025-12-04T16:18:56.3642264Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.json (deflated 94%)
2025-12-04T16:18:56.3643701Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.json (deflated 83%)
2025-12-04T16:18:56.3645133Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.json (deflated 83%)
2025-12-04T16:18:56.3646570Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.json (deflated 83%)
2025-12-04T16:18:56.3648004Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.json (deflated 83%)
2025-12-04T16:18:56.3649434Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.json (deflated 83%)
2025-12-04T16:18:56.3650872Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.json (deflated 86%)
2025-12-04T16:18:56.3652299Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.json (deflated 86%)
2025-12-04T16:18:56.3653733Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.json (deflated 86%)
2025-12-04T16:18:56.3655168Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.json (deflated 87%)
2025-12-04T16:18:56.3656596Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.json (deflated 83%)
2025-12-04T16:18:56.3658020Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.json (deflated 83%)
2025-12-04T16:18:56.3659457Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.json (deflated 83%)
2025-12-04T16:18:56.3660899Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.json (deflated 83%)
2025-12-04T16:18:56.3662322Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.json (deflated 83%)
2025-12-04T16:18:56.3663841Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.json (deflated 83%)
2025-12-04T16:18:56.3665264Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.json (deflated 83%)
2025-12-04T16:18:56.3666692Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.json (deflated 83%)
2025-12-04T16:18:56.3668118Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.json (deflated 83%)
2025-12-04T16:18:56.3669621Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.json (deflated 83%)
2025-12-04T16:18:56.3671034Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.json (deflated 83%)
2025-12-04T16:18:56.3672592Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.json (deflated 87%)
2025-12-04T16:18:56.3674025Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.json (deflated 86%)
2025-12-04T16:18:56.3675461Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.json (deflated 86%)
2025-12-04T16:18:56.3676899Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.json (deflated 86%)
2025-12-04T16:18:56.3678322Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.json (deflated 83%)
2025-12-04T16:18:56.3679755Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.json (deflated 83%)
2025-12-04T16:18:56.3681181Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.json (deflated 85%)
2025-12-04T16:18:56.3682689Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.json (deflated 83%)
2025-12-04T16:18:56.3684108Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.json (deflated 83%)
2025-12-04T16:18:56.3685533Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.json (deflated 86%)
2025-12-04T16:18:56.3686964Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.json (deflated 83%)
2025-12-04T16:18:56.3688389Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.json (deflated 83%)
2025-12-04T16:18:56.3689823Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.json (deflated 83%)
2025-12-04T16:18:56.3691249Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.json (deflated 83%)
2025-12-04T16:18:56.3692693Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.json (deflated 83%)
2025-12-04T16:18:56.3694120Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.json (deflated 86%)
2025-12-04T16:18:56.3695543Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.json (deflated 86%)
2025-12-04T16:18:56.3696962Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.json (deflated 86%)
2025-12-04T16:18:56.3698404Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.json (deflated 83%)
2025-12-04T16:18:56.3699858Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.json (deflated 83%)
2025-12-04T16:18:56.3701610Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.json (deflated 83%)
2025-12-04T16:18:56.3703108Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.json (deflated 83%)
2025-12-04T16:18:56.3704557Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.json (deflated 83%)
2025-12-04T16:18:56.3706122Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.json (deflated 83%)
2025-12-04T16:18:56.3707541Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.json (deflated 83%)
2025-12-04T16:18:56.3708976Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.json (deflated 83%)
2025-12-04T16:18:56.3710408Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.json (deflated 83%)
2025-12-04T16:18:56.3711842Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.json (deflated 83%)
2025-12-04T16:18:56.3713275Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.json (deflated 83%)
2025-12-04T16:18:56.3714696Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.json (deflated 83%)
2025-12-04T16:18:56.3716133Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.json (deflated 83%)
2025-12-04T16:18:56.3717568Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.json (deflated 83%)
2025-12-04T16:18:56.3719004Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.json (deflated 83%)
2025-12-04T16:18:56.3720425Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.json (deflated 83%)
2025-12-04T16:18:56.3721868Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.json (deflated 83%)
2025-12-04T16:18:56.3723384Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.json (deflated 83%)
2025-12-04T16:18:56.3724820Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.json (deflated 84%)
2025-12-04T16:18:56.3726257Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.json (deflated 83%)
2025-12-04T16:18:56.3727699Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.json (deflated 83%)
2025-12-04T16:18:56.3729132Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.json (deflated 83%)
2025-12-04T16:18:56.3730568Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.json (deflated 83%)
2025-12-04T16:18:56.3732016Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.json (deflated 83%)
2025-12-04T16:18:56.3733443Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.json (deflated 86%)
2025-12-04T16:18:56.3734871Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.json (deflated 86%)
2025-12-04T16:18:56.3736301Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.json (deflated 86%)
2025-12-04T16:18:56.3737774Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.json (deflated 85%)
2025-12-04T16:18:56.3739239Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.json (deflated 83%)
2025-12-04T16:18:56.3740716Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.json (deflated 83%)
2025-12-04T16:18:56.3742152Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.json (deflated 83%)
2025-12-04T16:18:56.3743592Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.json (deflated 83%)
2025-12-04T16:18:56.3745025Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.json (deflated 83%)
2025-12-04T16:18:56.3746445Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.json (deflated 84%)
2025-12-04T16:18:56.3747877Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.json (deflated 83%)
2025-12-04T16:18:56.3749313Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.json (deflated 83%)
2025-12-04T16:18:56.3750754Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.json (deflated 83%)
2025-12-04T16:18:56.3752175Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.json (deflated 83%)
2025-12-04T16:18:56.3753610Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.json (deflated 83%)
2025-12-04T16:18:56.3755038Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.json (deflated 90%)
2025-12-04T16:18:56.3756483Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.json (deflated 82%)
2025-12-04T16:18:56.3757913Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.json (deflated 82%)
2025-12-04T16:18:56.3759326Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.json (deflated 92%)
2025-12-04T16:18:56.3760749Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.json (deflated 83%)
2025-12-04T16:18:56.3762237Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.json (deflated 83%)
2025-12-04T16:18:56.3763674Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.json (deflated 91%)
2025-12-04T16:18:56.3765141Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.json (deflated 83%)
2025-12-04T16:18:56.3766555Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.json (deflated 83%)
2025-12-04T16:18:56.3767979Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.json (deflated 86%)
2025-12-04T16:18:56.3769408Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.json (deflated 83%)
2025-12-04T16:18:56.3770885Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.json (deflated 83%)
2025-12-04T16:18:56.3772340Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.json (deflated 86%)
2025-12-04T16:18:56.3773849Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.json (deflated 86%)
2025-12-04T16:18:56.3775283Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.json (deflated 86%)
2025-12-04T16:18:56.3776703Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.json (deflated 85%)
2025-12-04T16:18:56.3778135Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.json (deflated 83%)
2025-12-04T16:18:56.3779553Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.json (deflated 83%)
2025-12-04T16:18:56.3780985Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.json (deflated 84%)
2025-12-04T16:18:56.3782429Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.json (deflated 83%)
2025-12-04T16:18:56.3783867Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.json (deflated 83%)
2025-12-04T16:18:56.3785289Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.json (deflated 85%)
2025-12-04T16:18:56.3786719Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.json (deflated 83%)
2025-12-04T16:18:56.3788149Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.json (deflated 83%)
2025-12-04T16:18:56.3789589Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.json (deflated 83%)
2025-12-04T16:18:56.3791006Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.json (deflated 83%)
2025-12-04T16:18:56.3792434Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.json (deflated 83%)
2025-12-04T16:18:56.3793867Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.json (deflated 86%)
2025-12-04T16:18:56.3795295Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.json (deflated 83%)
2025-12-04T16:18:56.3796724Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.json (deflated 83%)
2025-12-04T16:18:56.3798145Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.json (deflated 83%)
2025-12-04T16:18:56.3799572Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.json (deflated 83%)
2025-12-04T16:18:56.3801166Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.json (deflated 83%)
2025-12-04T16:18:56.3802673Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.json (deflated 87%)
2025-12-04T16:18:56.3804536Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.json (deflated 86%)
2025-12-04T16:18:56.3805974Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.json (deflated 86%)
2025-12-04T16:18:56.3807564Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.json (deflated 86%)
2025-12-04T16:18:56.3809004Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.json (deflated 86%)
2025-12-04T16:18:56.3810423Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.json (deflated 86%)
2025-12-04T16:18:56.3811847Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.json (deflated 83%)
2025-12-04T16:18:56.3813280Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.json (deflated 83%)
2025-12-04T16:18:56.3814718Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.json (deflated 83%)
2025-12-04T16:18:56.3816156Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.json (deflated 85%)
2025-12-04T16:18:56.3817594Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.json (deflated 83%)
2025-12-04T16:18:56.3819029Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.json (deflated 83%)
2025-12-04T16:18:56.3820470Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.json (deflated 83%)
2025-12-04T16:18:56.3821909Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.json (deflated 83%)
2025-12-04T16:18:56.3823333Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.json (deflated 83%)
2025-12-04T16:18:56.3824776Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.json (deflated 83%)
2025-12-04T16:18:56.3826215Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.json (deflated 83%)
2025-12-04T16:18:56.3827646Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.json (deflated 83%)
2025-12-04T16:18:56.3829085Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.json (deflated 82%)
2025-12-04T16:18:56.4104460Z   adding: test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.json (deflated 99%)
2025-12-04T16:18:56.4126210Z   adding: test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.json (deflated 93%)
2025-12-04T16:18:56.4156350Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.json (deflated 97%)
2025-12-04T16:18:56.4168518Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.json (deflated 95%)
2025-12-04T16:18:56.4181791Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.json (deflated 95%)
2025-12-04T16:18:56.4193802Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.json (deflated 95%)
2025-12-04T16:18:56.4206912Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.json (deflated 95%)
2025-12-04T16:18:56.4382927Z   adding: test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.json (deflated 97%)
2025-12-04T16:18:56.4399188Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.json (deflated 97%)
2025-12-04T16:18:56.4436448Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.json (deflated 98%)
2025-12-04T16:18:56.4519467Z   adding: test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.json (deflated 96%)
2025-12-04T16:18:56.4606445Z   adding: test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.json (deflated 97%)
2025-12-04T16:18:56.4649568Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.json (deflated 95%)
2025-12-04T16:18:56.4690894Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.json (deflated 95%)
2025-12-04T16:18:56.4705835Z   adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.json (deflated 97%)
2025-12-04T16:18:56.4714116Z   adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.json (deflated 95%)
2025-12-04T16:18:56.4715612Z   adding: test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.json (stored 0%)
2025-12-04T16:18:56.4720689Z   adding: test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.json (deflated 95%)
2025-12-04T16:18:56.4721931Z   adding: test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.json (deflated 95%)
2025-12-04T16:18:56.4723187Z   adding: test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.json (deflated 94%)
2025-12-04T16:18:56.4724447Z   adding: test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.json (deflated 62%)
2025-12-04T16:18:56.4725883Z   adding: test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.json (stored 0%)
2025-12-04T16:18:56.4727311Z   adding: test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.json (deflated 95%)
2025-12-04T16:18:56.4728653Z   adding: test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.json (deflated 80%)
2025-12-04T16:18:56.4730040Z   adding: test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.json (deflated 63%)
2025-12-04T16:18:56.4731441Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.json (deflated 96%)
2025-12-04T16:18:56.4732858Z   adding: test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.json (deflated 74%)
2025-12-04T16:18:56.4734238Z   adding: test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.json (deflated 57%)
2025-12-04T16:18:56.4735755Z   adding: test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.json (deflated 83%)
2025-12-04T16:18:56.4737251Z   adding: test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.json (deflated 89%)
2025-12-04T16:18:56.4738604Z   adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.json (deflated 76%)
2025-12-04T16:18:56.4739866Z   adding: test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.json (deflated 75%)
2025-12-04T16:18:56.4741057Z   adding: test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.json (deflated 87%)
2025-12-04T16:18:56.4742518Z   adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.json (deflated 72%)
2025-12-04T16:18:56.4744010Z   adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.json (deflated 92%)
2025-12-04T16:18:56.4745349Z   adding: test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.json (deflated 77%)
2025-12-04T16:18:56.4746639Z   adding: test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.json (deflated 76%)
2025-12-04T16:18:56.4772809Z   adding: test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.json (deflated 97%)
2025-12-04T16:18:56.4773997Z   adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.json (deflated 33%)
2025-12-04T16:18:56.4812422Z   adding: test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.json (deflated 96%)
2025-12-04T16:18:56.4813871Z   adding: test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.json (deflated 65%)
2025-12-04T16:18:56.4815382Z   adding: test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.json (deflated 71%)
2025-12-04T16:18:56.4816718Z   adding: test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.json (deflated 88%)
2025-12-04T16:18:56.4826606Z   adding: test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.json (deflated 94%)
2025-12-04T16:18:56.4827984Z   adding: test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.json (deflated 85%)
2025-12-04T16:18:56.4829412Z   adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.json (deflated 74%)
2025-12-04T16:18:56.4830825Z   adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.json (deflated 74%)
2025-12-04T16:18:56.4832306Z   adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.json (deflated 87%)
2025-12-04T16:18:56.4833681Z   adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.json (deflated 70%)
2025-12-04T16:18:56.4835039Z   adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.json (deflated 62%)
2025-12-04T16:18:56.4836343Z   adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.json (deflated 66%)
2025-12-04T16:18:56.4853473Z   adding: test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.json (deflated 98%)
2025-12-04T16:18:56.4866691Z   adding: test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.json (deflated 98%)
2025-12-04T16:18:56.4867856Z   adding: test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.json (deflated 78%)
2025-12-04T16:18:56.4869095Z   adding: test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.json (deflated 87%)
2025-12-04T16:18:56.4870332Z   adding: test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.json (deflated 78%)
2025-12-04T16:18:56.4871461Z   adding: test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.json (deflated 79%)
2025-12-04T16:18:56.4872781Z   adding: test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.json (deflated 39%)
2025-12-04T16:18:56.4874716Z   adding: test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.json (deflated 89%)
2025-12-04T16:18:56.4875905Z   adding: test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.json (deflated 84%)
2025-12-04T16:18:56.4877242Z   adding: test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.json (deflated 84%)
2025-12-04T16:18:56.4878598Z   adding: test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.json (deflated 83%)
2025-12-04T16:18:56.4991541Z   adding: test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.json (deflated 88%)
2025-12-04T16:18:56.4992711Z   adding: test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.json (deflated 86%)
2025-12-04T16:18:56.5014760Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.json (deflated 97%)
2025-12-04T16:18:56.5016424Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.json (deflated 87%)
2025-12-04T16:18:56.5017691Z   adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.json (deflated 92%)
2025-12-04T16:18:56.5018928Z   adding: test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.json (deflated 78%)
2025-12-04T16:18:56.5022126Z   adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.json (deflated 90%)
2025-12-04T16:18:56.5028531Z   adding: test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.json (deflated 92%)
2025-12-04T16:18:56.5029809Z   adding: test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.json (deflated 69%)
2025-12-04T16:18:56.5032753Z   adding: test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.json (deflated 95%)
2025-12-04T16:18:56.5034047Z   adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.json (deflated 84%)
2025-12-04T16:18:56.5035268Z   adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.json (deflated 94%)
2025-12-04T16:18:56.5036619Z   adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.json (deflated 94%)
2025-12-04T16:18:56.5037919Z   adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.json (deflated 72%)
2025-12-04T16:18:56.5042369Z   adding: test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.json (deflated 93%)
2025-12-04T16:18:56.5044308Z   adding: test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.json (deflated 90%)
2025-12-04T16:18:56.5045553Z   adding: test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.json (stored 0%)
2025-12-04T16:18:56.5046821Z   adding: test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.json (deflated 59%)
2025-12-04T16:18:56.5048210Z   adding: test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.json (deflated 81%)
2025-12-04T16:18:56.5049522Z   adding: test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.json (deflated 78%)
2025-12-04T16:18:56.5050727Z   adding: test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.json (deflated 82%)
2025-12-04T16:18:56.5051858Z   adding: test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.json (deflated 88%)
2025-12-04T16:18:56.5053417Z   adding: test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.json (deflated 93%)
2025-12-04T16:18:56.5054616Z   adding: test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.json (deflated 85%)
2025-12-04T16:18:56.5056324Z   adding: test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.json (deflated 95%)
2025-12-04T16:18:56.5057536Z   adding: test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.json (deflated 86%)
2025-12-04T16:18:56.5058639Z   adding: test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.json (deflated 88%)
2025-12-04T16:18:56.5066470Z   adding: test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.json (deflated 98%)
2025-12-04T16:18:56.5067717Z   adding: test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.json (deflated 80%)
2025-12-04T16:18:56.5069024Z   adding: test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.json (deflated 93%)
2025-12-04T16:18:56.5101002Z   adding: test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.json (deflated 97%)
2025-12-04T16:18:56.5101981Z   adding: test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.json (deflated 90%)
2025-12-04T16:18:56.5103029Z   adding: test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.json (deflated 32%)
2025-12-04T16:18:56.5106032Z   adding: test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.json (deflated 92%)
2025-12-04T16:18:56.5107373Z   adding: test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.json (deflated 94%)
2025-12-04T16:18:56.5108483Z   adding: test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.json (deflated 94%)
2025-12-04T16:18:56.5109520Z   adding: test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.json (deflated 83%)
2025-12-04T16:18:56.5110597Z   adding: test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.json (deflated 93%)
2025-12-04T16:18:56.5111740Z   adding: test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.json (deflated 84%)
2025-12-04T16:18:56.5117725Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.json (deflated 96%)
2025-12-04T16:18:56.5141073Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.json (deflated 97%)
2025-12-04T16:18:56.5142333Z   adding: test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.json (deflated 33%)
2025-12-04T16:18:56.5155269Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.json (deflated 97%)
2025-12-04T16:18:56.5159079Z   adding: test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.json (deflated 97%)
2025-12-04T16:18:56.5161699Z   adding: test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.json (deflated 93%)
2025-12-04T16:18:56.5181261Z   adding: test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.json (deflated 97%)
2025-12-04T16:18:56.5185293Z   adding: test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.json (deflated 95%)
2025-12-04T16:18:56.5187602Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.json (deflated 97%)
2025-12-04T16:18:56.5189646Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.json (deflated 95%)
2025-12-04T16:18:56.5191122Z   adding: test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.json (deflated 89%)
2025-12-04T16:18:56.5192414Z   adding: test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.json (deflated 73%)
2025-12-04T16:18:56.5197157Z   adding: test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.json (deflated 96%)
2025-12-04T16:18:56.5203113Z   adding: test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.json (deflated 94%)
2025-12-04T16:18:56.5234202Z ##[group]Run # Remove any previous test reports if they exist
2025-12-04T16:18:56.5234770Z [36;1m# Remove any previous test reports if they exist[0m
2025-12-04T16:18:56.5235226Z [36;1mrm -f test-reports-*.zip[0m
2025-12-04T16:18:56.5235786Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test/test-reports -i '*.xml' -i '*.csv'[0m
2025-12-04T16:18:56.5242777Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:56.5243220Z env:
2025-12-04T16:18:56.5243480Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.5243780Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.5244151Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.5244816Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.5245617Z   FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T16:18:56.5246196Z ##[endgroup]
2025-12-04T16:18:56.5386452Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-3469ffb5f6430eac.xml (deflated 92%)
2025-12-04T16:18:56.5388354Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-c74aaedaf90eea12.xml (deflated 90%)
2025-12-04T16:18:56.5390480Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-27dd4691bf7b3baf.xml (deflated 90%)
2025-12-04T16:18:56.5394539Z   adding: test/test-reports/python-pytest/inductor.test_aot_inductor/inductor.test_aot_inductor-471109228b9bc8b1.xml (deflated 92%)
2025-12-04T16:18:56.5422532Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-d9786e35c31a1406.xml (deflated 92%)
2025-12-04T16:18:56.5452501Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_dynamic_shapes/inductor.test_torchinductor_dynamic_shapes-334d9946fa595278.xml (deflated 93%)
2025-12-04T16:18:56.5455901Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-96b82d738bd32122.xml (deflated 88%)
2025-12-04T16:18:56.5458113Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-5c40f8a5eb55b478.xml (deflated 89%)
2025-12-04T16:18:56.5460340Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ab540be19127662e.xml (deflated 89%)
2025-12-04T16:18:56.5461732Z   adding: test/test-reports/python-pytest/inductor.test_kernel_benchmark/inductor.test_kernel_benchmark-ceb40d24a6394526.xml (deflated 73%)
2025-12-04T16:18:56.5465919Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-3c3aadd8ccf63ac5.xml (deflated 93%)
2025-12-04T16:18:56.5469828Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-61cf9773289d26de.xml (deflated 92%)
2025-12-04T16:18:56.5474228Z   adding: test/test-reports/python-pytest/inductor.test_torchinductor_opinfo/inductor.test_torchinductor_opinfo-bddaa2f603017d2f.xml (deflated 92%)
2025-12-04T16:18:56.5478650Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-283ddf549cce6309.xml (deflated 88%)
2025-12-04T16:18:56.5480640Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-1b5ebcdca18d4e19.xml (deflated 89%)
2025-12-04T16:18:56.5482654Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-e19a61202ca16580.xml (deflated 89%)
2025-12-04T16:18:56.5484507Z   adding: test/test-reports/python-pytest/inductor.test_pattern_matcher/inductor.test_pattern_matcher-a3ba5f364f03aed8.xml (deflated 87%)
2025-12-04T16:18:56.5485863Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a1f65e7d467aee95.xml (deflated 84%)
2025-12-04T16:18:56.5487110Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-e6d248469cfc058f.xml (deflated 80%)
2025-12-04T16:18:56.5488355Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-f3f2e4b24ff37d87.xml (deflated 80%)
2025-12-04T16:18:56.5490960Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-381f6a62351f53ee.xml (deflated 91%)
2025-12-04T16:18:56.5506144Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a962ee87389a597a.xml (deflated 95%)
2025-12-04T16:18:56.5520779Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a729e49bf29a928c.xml (deflated 95%)
2025-12-04T16:18:56.5525455Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-a17633d8774721c5.xml (deflated 89%)
2025-12-04T16:18:56.5547542Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-3b77aed58497c4ef.xml (deflated 95%)
2025-12-04T16:18:56.5548796Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-7136c4750752341b.xml (deflated 95%)
2025-12-04T16:18:56.5550664Z   adding: test/test-reports/python-pytest/inductor.test_cuda_repro/inductor.test_cuda_repro-c9c365f110868c46.xml (deflated 89%)
2025-12-04T16:18:56.5562126Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-805c3a8113d13722.xml (deflated 91%)
2025-12-04T16:18:56.5563503Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-51c6b785ca935a69.xml (deflated 55%)
2025-12-04T16:18:56.5569074Z   adding: test/test-reports/python-pytest/inductor.test_cudagraph_trees/inductor.test_cudagraph_trees-e7f4556f4f4f751d.xml (deflated 90%)
2025-12-04T16:18:56.5570494Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-c40e88b21f3dd767.xml (deflated 85%)
2025-12-04T16:18:56.5571966Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9074e5af9f7e7d92.xml (deflated 84%)
2025-12-04T16:18:56.5573420Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-10ff13c663ad5077.xml (deflated 84%)
2025-12-04T16:18:56.5574880Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ee0de851594c228e.xml (deflated 85%)
2025-12-04T16:18:56.5576346Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eb93cd35b9ecccb8.xml (deflated 84%)
2025-12-04T16:18:56.5577804Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-63eb31d4436f1164.xml (deflated 84%)
2025-12-04T16:18:56.5579264Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8fe2f36a52fbcf80.xml (deflated 85%)
2025-12-04T16:18:56.5580711Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-cee8502954df528c.xml (deflated 84%)
2025-12-04T16:18:56.5582176Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-48bbd6d243994e17.xml (deflated 84%)
2025-12-04T16:18:56.5583734Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-04bee3cdcda101b6.xml (deflated 85%)
2025-12-04T16:18:56.5585247Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-0653410d18e9d78e.xml (deflated 84%)
2025-12-04T16:18:56.5586786Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-34a9d39084dff1b6.xml (deflated 84%)
2025-12-04T16:18:56.5588251Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c9ee3a2d8186602.xml (deflated 85%)
2025-12-04T16:18:56.5589713Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-126fca4cd7b29c10.xml (deflated 84%)
2025-12-04T16:18:56.5591178Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-eddfed0d2b029629.xml (deflated 84%)
2025-12-04T16:18:56.5592640Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-b22078b8c085cdcd.xml (deflated 85%)
2025-12-04T16:18:56.5594094Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38e32e50c56cc24f.xml (deflated 84%)
2025-12-04T16:18:56.5595560Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-d85417ecba0abe7a.xml (deflated 84%)
2025-12-04T16:18:56.5597017Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-1802f570a905faf5.xml (deflated 85%)
2025-12-04T16:18:56.5598472Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ca420a576680224b.xml (deflated 84%)
2025-12-04T16:18:56.5599938Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-9a9f08c6e10d54f7.xml (deflated 84%)
2025-12-04T16:18:56.5601603Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-4c59271afe170d67.xml (deflated 85%)
2025-12-04T16:18:56.5603174Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-bb71b131031d8408.xml (deflated 84%)
2025-12-04T16:18:56.5604637Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-565cf24db94440d1.xml (deflated 84%)
2025-12-04T16:18:56.5606093Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-607f169455f7ccc0.xml (deflated 85%)
2025-12-04T16:18:56.5607538Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-531db397873a40b2.xml (deflated 84%)
2025-12-04T16:18:56.5609007Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a6ed46f8a6f71ef7.xml (deflated 84%)
2025-12-04T16:18:56.5610473Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-8ab81f77c2cb5952.xml (deflated 85%)
2025-12-04T16:18:56.5611933Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-38d9a64e046ee91f.xml (deflated 84%)
2025-12-04T16:18:56.5613387Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-747a72e37803dfe4.xml (deflated 84%)
2025-12-04T16:18:56.5614820Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-54023c099f6c1322.xml (deflated 85%)
2025-12-04T16:18:56.5616343Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-ad9ca42cc99e9c7e.xml (deflated 84%)
2025-12-04T16:18:56.5617802Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-068870c4e7b35c60.xml (deflated 84%)
2025-12-04T16:18:56.5619290Z   adding: test/test-reports/python-pytest/inductor.test_cuda_select_algorithm/inductor.test_cuda_select_algorithm-a0d0acf02d82ecbb.xml (deflated 28%)
2025-12-04T16:18:56.5620834Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-a2f9525a35872883.xml (deflated 75%)
2025-12-04T16:18:56.5622174Z   adding: test/test-reports/python-pytest/inductor.test_deterministic/inductor.test_deterministic-6b09493f63855de7.xml (deflated 61%)
2025-12-04T16:18:56.5623544Z   adding: test/test-reports/python-pytest/inductor.test_extension_backend/inductor.test_extension_backend-107c721ddd062adf.xml (deflated 58%)
2025-12-04T16:18:56.5624891Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-6880425f749978d6.xml (deflated 85%)
2025-12-04T16:18:56.5626171Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-469bba077eb48143.xml (deflated 81%)
2025-12-04T16:18:56.5627471Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-4093a29cc92449a3.xml (deflated 81%)
2025-12-04T16:18:56.5628774Z   adding: test/test-reports/python-pytest/inductor.test_native_matmul/inductor.test_native_matmul-f7b8d41d555aa509.xml (deflated 79%)
2025-12-04T16:18:56.5630077Z   adding: test/test-reports/python-pytest/dynamo.test_fx_graph_runnable/dynamo.test_fx_graph_runnable-0790c18290928611.xml (deflated 87%)
2025-12-04T16:18:56.5631315Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-692fd365c2b33f50.xml (deflated 92%)
2025-12-04T16:18:56.5632469Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-8c32992e913c2c64.xml (deflated 92%)
2025-12-04T16:18:56.5633728Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-73235157d9df4ae2.xml (deflated 92%)
2025-12-04T16:18:56.5634892Z   adding: test/test-reports/python-pytest/inductor.test_memory/inductor.test_memory-9741d261d282c9ae.xml (deflated 83%)
2025-12-04T16:18:56.5636044Z   adding: test/test-reports/python-pytest/dynamo.test_streams/dynamo.test_streams-061202c25215a4da.xml (deflated 84%)
2025-12-04T16:18:56.5637284Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-ad02460068a39927.xml (deflated 89%)
2025-12-04T16:18:56.5649706Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-e60f88ff4be47487.xml (deflated 95%)
2025-12-04T16:18:56.5665152Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-2d7921f0967c562b.xml (deflated 95%)
2025-12-04T16:18:56.5666540Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-38b03205b4b4e8b2.xml (deflated 87%)
2025-12-04T16:18:56.5682221Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-4b997b321b918bd4.xml (deflated 95%)
2025-12-04T16:18:56.5697792Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-c0ee399e0a993179.xml (deflated 95%)
2025-12-04T16:18:56.5699304Z   adding: test/test-reports/python-pytest/inductor.test_unbacked_symints/inductor.test_unbacked_symints-a04a8b8b31c4f983.xml (deflated 86%)
2025-12-04T16:18:56.5700748Z   adding: test/test-reports/python-pytest/inductor.test_scatter_optimization/inductor.test_scatter_optimization-b27b3789d1f96ec3.xml (deflated 81%)
2025-12-04T16:18:56.5702929Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-25ac053e9312843a.xml (deflated 93%)
2025-12-04T16:18:56.5704423Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-44c34e945447da70.xml (deflated 82%)
2025-12-04T16:18:56.5705844Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-faae5acc9f254e31.xml (deflated 82%)
2025-12-04T16:18:56.5707400Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1475705e30056d51.xml (deflated 82%)
2025-12-04T16:18:56.5708812Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-91702530804e6018.xml (deflated 82%)
2025-12-04T16:18:56.5710219Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5a377a8e3e546caa.xml (deflated 82%)
2025-12-04T16:18:56.5711634Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2d9eb46c30fffb97.xml (deflated 85%)
2025-12-04T16:18:56.5713041Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b2fcdf54f0dd8b56.xml (deflated 85%)
2025-12-04T16:18:56.5714451Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e6655594e475c158.xml (deflated 85%)
2025-12-04T16:18:56.5715866Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67b37ef947e223df.xml (deflated 85%)
2025-12-04T16:18:56.5717274Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5d8e49cfad949fb4.xml (deflated 82%)
2025-12-04T16:18:56.5718683Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b32d481ee6a300b7.xml (deflated 82%)
2025-12-04T16:18:56.5720074Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-de77de01625a8457.xml (deflated 82%)
2025-12-04T16:18:56.5721500Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ac8e542231b9ece8.xml (deflated 82%)
2025-12-04T16:18:56.5723009Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8a7277668f29c6c0.xml (deflated 82%)
2025-12-04T16:18:56.5724421Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cc01ae0bb83689a0.xml (deflated 82%)
2025-12-04T16:18:56.5725822Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-99d5fd7f63dbe293.xml (deflated 82%)
2025-12-04T16:18:56.5727229Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-826bb35711c419f6.xml (deflated 82%)
2025-12-04T16:18:56.5728638Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e5e187e59c02465d.xml (deflated 82%)
2025-12-04T16:18:56.5730053Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4a8119bc665e27c0.xml (deflated 82%)
2025-12-04T16:18:56.5731454Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2025dabe1cea3938.xml (deflated 82%)
2025-12-04T16:18:56.5732859Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-9d80dad9de413e50.xml (deflated 86%)
2025-12-04T16:18:56.5734257Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d8878b3838c421bc.xml (deflated 85%)
2025-12-04T16:18:56.5735650Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ccdefc43a9a17fe4.xml (deflated 85%)
2025-12-04T16:18:56.5737052Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f6f73f3414e84f03.xml (deflated 84%)
2025-12-04T16:18:56.5738518Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6fa30bf2f2d5eb51.xml (deflated 82%)
2025-12-04T16:18:56.5739955Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-38918fbd281ed213.xml (deflated 82%)
2025-12-04T16:18:56.5741420Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-1f043ea296196952.xml (deflated 84%)
2025-12-04T16:18:56.5742811Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5fb869340c48ef2f.xml (deflated 82%)
2025-12-04T16:18:56.5744216Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0d5e946f00308484.xml (deflated 82%)
2025-12-04T16:18:56.5745615Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c576f4628ae22849.xml (deflated 84%)
2025-12-04T16:18:56.5747012Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-33928f913f155d05.xml (deflated 82%)
2025-12-04T16:18:56.5748403Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-56a939c64c979699.xml (deflated 82%)
2025-12-04T16:18:56.5749810Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d3b58dae1e6fa80b.xml (deflated 82%)
2025-12-04T16:18:56.5751218Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ece32ee31ed5f94b.xml (deflated 82%)
2025-12-04T16:18:56.5752633Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ff6bee4ccf71b3b1.xml (deflated 82%)
2025-12-04T16:18:56.5754034Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-79a731795b247695.xml (deflated 86%)
2025-12-04T16:18:56.5755424Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-964095f569ab5f18.xml (deflated 85%)
2025-12-04T16:18:56.5756845Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c7f8a2bcbf5a7d94.xml (deflated 85%)
2025-12-04T16:18:56.5758260Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b44a26383ab5bf86.xml (deflated 82%)
2025-12-04T16:18:56.5759671Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-7c4c7b2c97f5ece3.xml (deflated 82%)
2025-12-04T16:18:56.5761074Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35670228d9257748.xml (deflated 82%)
2025-12-04T16:18:56.5762546Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c465169b2a187708.xml (deflated 82%)
2025-12-04T16:18:56.5763951Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2b01ab5056f11e9c.xml (deflated 82%)
2025-12-04T16:18:56.5765347Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-16aca9496f35b1a4.xml (deflated 82%)
2025-12-04T16:18:56.5766741Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-193d78131cdd083a.xml (deflated 82%)
2025-12-04T16:18:56.5768135Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dab7c947d86aa9a6.xml (deflated 82%)
2025-12-04T16:18:56.5769537Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a0e52521b9f6fa85.xml (deflated 82%)
2025-12-04T16:18:56.5770969Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-5bf2204027ce2523.xml (deflated 82%)
2025-12-04T16:18:56.5772360Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d6d9569795b0b902.xml (deflated 82%)
2025-12-04T16:18:56.5773840Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4c910b821c44d2f5.xml (deflated 82%)
2025-12-04T16:18:56.5775235Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-95077883b5abbff3.xml (deflated 82%)
2025-12-04T16:18:56.5776627Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4d3bae777d67a79f.xml (deflated 82%)
2025-12-04T16:18:56.5778245Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-118ad8744f1d4d27.xml (deflated 82%)
2025-12-04T16:18:56.5779832Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61456af580a4b7ac.xml (deflated 82%)
2025-12-04T16:18:56.5781365Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e90a690ff72dc1ab.xml (deflated 82%)
2025-12-04T16:18:56.5782928Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6357b547ca746444.xml (deflated 82%)
2025-12-04T16:18:56.5784429Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e10adef85f4d6151.xml (deflated 83%)
2025-12-04T16:18:56.5785966Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-3c2c7e3f96ee06db.xml (deflated 82%)
2025-12-04T16:18:56.5787542Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-111a9f95bebe1e39.xml (deflated 82%)
2025-12-04T16:18:56.5789079Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-87f44cfa0e8a9d8f.xml (deflated 82%)
2025-12-04T16:18:56.5790565Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ffc35ad917f63350.xml (deflated 82%)
2025-12-04T16:18:56.5792181Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bb2bca61f02d857f.xml (deflated 82%)
2025-12-04T16:18:56.5793714Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-17f448aea025f304.xml (deflated 85%)
2025-12-04T16:18:56.5795266Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85582e9ee40ebc55.xml (deflated 85%)
2025-12-04T16:18:56.5796833Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c795322010e61bce.xml (deflated 85%)
2025-12-04T16:18:56.5798314Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-af1ce6171d14e609.xml (deflated 84%)
2025-12-04T16:18:56.5799860Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-00b52dc1e610ac68.xml (deflated 82%)
2025-12-04T16:18:56.5801574Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-40be700c41c1be61.xml (deflated 82%)
2025-12-04T16:18:56.5803181Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-063cd6c16f492c0b.xml (deflated 82%)
2025-12-04T16:18:56.5804763Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cdb46a62f836b20.xml (deflated 82%)
2025-12-04T16:18:56.5806320Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-50364e1db5a413f2.xml (deflated 82%)
2025-12-04T16:18:56.5807849Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-329d5d08d886772a.xml (deflated 83%)
2025-12-04T16:18:56.5809472Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8e3e317a92830ba6.xml (deflated 82%)
2025-12-04T16:18:56.5811094Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-fba34ccbfe47be41.xml (deflated 82%)
2025-12-04T16:18:56.5812575Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-67eecf299b49620e.xml (deflated 82%)
2025-12-04T16:18:56.5814161Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-689365daff97a217.xml (deflated 82%)
2025-12-04T16:18:56.5815702Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-61d7df0dfd715866.xml (deflated 82%)
2025-12-04T16:18:56.5817272Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-bbb315e2c7566474.xml (deflated 88%)
2025-12-04T16:18:56.5818834Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2cbdafed15e10f46.xml (deflated 81%)
2025-12-04T16:18:56.5820326Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-250d1e9631b51e82.xml (deflated 81%)
2025-12-04T16:18:56.5821865Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-659d038e96b5f102.xml (deflated 91%)
2025-12-04T16:18:56.5823411Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-52f302a009c99a45.xml (deflated 82%)
2025-12-04T16:18:56.5824955Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-edb5da82dbb96991.xml (deflated 82%)
2025-12-04T16:18:56.5826523Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-26ee580f1806e0f2.xml (deflated 90%)
2025-12-04T16:18:56.5828015Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b7cfd41a69868cc6.xml (deflated 82%)
2025-12-04T16:18:56.5829555Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4573f1e428dcb095.xml (deflated 82%)
2025-12-04T16:18:56.5831106Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2c1257cd859214a9.xml (deflated 85%)
2025-12-04T16:18:56.5832642Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6b4b9f12b6851f04.xml (deflated 82%)
2025-12-04T16:18:56.5834202Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-469eeaa86aae0ce8.xml (deflated 82%)
2025-12-04T16:18:56.5835706Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f89f3afb1f628785.xml (deflated 85%)
2025-12-04T16:18:56.5837254Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85fcc5c00efd74bd.xml (deflated 85%)
2025-12-04T16:18:56.5838804Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-dcb0b47762861151.xml (deflated 85%)
2025-12-04T16:18:56.5840349Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31949d00d4596283.xml (deflated 84%)
2025-12-04T16:18:56.5841836Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b6b2b2997a48fffb.xml (deflated 82%)
2025-12-04T16:18:56.5843509Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-b63fc96940c5dfca.xml (deflated 82%)
2025-12-04T16:18:56.5845127Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-31833c8bcf86882f.xml (deflated 83%)
2025-12-04T16:18:56.5846727Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6dc75b6b5f29fbb9.xml (deflated 82%)
2025-12-04T16:18:56.5848308Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d97c974b9c50bec3.xml (deflated 82%)
2025-12-04T16:18:56.5849795Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-588d96b64bf97b8d.xml (deflated 84%)
2025-12-04T16:18:56.5851325Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-78303b7c44b57e72.xml (deflated 82%)
2025-12-04T16:18:56.5852899Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0fe2928d1b5c12d6.xml (deflated 82%)
2025-12-04T16:18:56.5854432Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-85a344e8e648e5ca.xml (deflated 82%)
2025-12-04T16:18:56.5855906Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d0617f72a4b97751.xml (deflated 82%)
2025-12-04T16:18:56.5857486Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e1acf558219bc739.xml (deflated 82%)
2025-12-04T16:18:56.5859015Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-d7d30d97e183551e.xml (deflated 85%)
2025-12-04T16:18:56.5860521Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-42d654f8293abc5a.xml (deflated 82%)
2025-12-04T16:18:56.5862117Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-ee7c83ecdc672647.xml (deflated 82%)
2025-12-04T16:18:56.5863597Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0ceb6628ed982867.xml (deflated 82%)
2025-12-04T16:18:56.5865101Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8928a6b00b051b8.xml (deflated 82%)
2025-12-04T16:18:56.5866503Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-638b7d3a6684657f.xml (deflated 82%)
2025-12-04T16:18:56.5867925Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-28d8e196fd24a123.xml (deflated 86%)
2025-12-04T16:18:56.5869363Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-618da663b64859ce.xml (deflated 85%)
2025-12-04T16:18:56.5870791Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-0193ecefca06b5b7.xml (deflated 85%)
2025-12-04T16:18:56.5872222Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-94e72f0552a6d934.xml (deflated 85%)
2025-12-04T16:18:56.5873641Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c8e966a64a8d91b0.xml (deflated 85%)
2025-12-04T16:18:56.5875066Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6498f30a7931ed78.xml (deflated 85%)
2025-12-04T16:18:56.5876490Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-35a9228a36f00ca8.xml (deflated 82%)
2025-12-04T16:18:56.5877971Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6d31e9c231a839ae.xml (deflated 82%)
2025-12-04T16:18:56.5879380Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-15938e0b51a5f238.xml (deflated 82%)
2025-12-04T16:18:56.5880899Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-2f0e8060bc3a964c.xml (deflated 84%)
2025-12-04T16:18:56.5882402Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-a7bcf286e5b1017b.xml (deflated 82%)
2025-12-04T16:18:56.5883826Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-c21fd94b2a445d75.xml (deflated 82%)
2025-12-04T16:18:56.5885254Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-e84bcc8fc890320e.xml (deflated 82%)
2025-12-04T16:18:56.5886663Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-71b1acdff50f0444.xml (deflated 82%)
2025-12-04T16:18:56.5888094Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-f8b74fab1a7c01df.xml (deflated 82%)
2025-12-04T16:18:56.5889526Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-cda23e9a2cebd271.xml (deflated 82%)
2025-12-04T16:18:56.5890950Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-6ef0f921a65804fa.xml (deflated 82%)
2025-12-04T16:18:56.5892360Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-4beacf6124c4825f.xml (deflated 82%)
2025-12-04T16:18:56.5893794Z   adding: test/test-reports/python-pytest/inductor.test_mix_order_reduction/inductor.test_mix_order_reduction-8d5eb132574c3bbb.xml (deflated 77%)
2025-12-04T16:18:56.6060177Z   adding: test/test-reports/python-pytest/test_transformers/test_transformers-314991beba6d5b67.xml (deflated 99%)
2025-12-04T16:18:56.6078822Z   adding: test/test-reports/python-pytest/test_autograd/test_autograd-9411f135e03cf921.xml (deflated 88%)
2025-12-04T16:18:56.6103315Z   adding: test/test-reports/python-pytest/test_sparse/test_sparse-8ac5504ea5d63e83.xml (deflated 95%)
2025-12-04T16:18:56.6112800Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-b93e416e4714efc8.xml (deflated 91%)
2025-12-04T16:18:56.6123741Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-298d565a78b93d88.xml (deflated 91%)
2025-12-04T16:18:56.6133279Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-da1c924c8984f5ba.xml (deflated 91%)
2025-12-04T16:18:56.6143293Z   adding: test/test-reports/python-pytest/test_decomp/test_decomp-20c517b051912976.xml (deflated 91%)
2025-12-04T16:18:56.6296585Z   adding: test/test-reports/python-pytest/test_meta/test_meta-0566a97fe52d3e43.xml (deflated 96%)
2025-12-04T16:18:56.6310598Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-c099bcb3f2a041ec.xml (deflated 96%)
2025-12-04T16:18:56.6344594Z   adding: test/test-reports/python-pytest/test_nestedtensor/test_nestedtensor-b4c65009171fef32.xml (deflated 98%)
2025-12-04T16:18:56.6413753Z   adding: test/test-reports/python-pytest/test_ops/test_ops-9d1debb5033aecec.xml (deflated 95%)
2025-12-04T16:18:56.6487397Z   adding: test/test-reports/python-pytest/test_ops/test_ops-9b78a46860708967.xml (deflated 95%)
2025-12-04T16:18:56.6523429Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-bd6912e48e96c8e4.xml (deflated 93%)
2025-12-04T16:18:56.6557226Z   adding: test/test-reports/python-pytest/functorch.test_ops/functorch.test_ops-da40a8ab5c416f48.xml (deflated 93%)
2025-12-04T16:18:56.6570483Z   adding: test/test-reports/python-pytest/inductor.test_cpu_repro/inductor.test_cpu_repro-5dd5f1708cbcb0aa.xml (deflated 96%)
2025-12-04T16:18:56.6577976Z   adding: test/test-reports/python-pytest/inductor.test_mkldnn_pattern_matcher/inductor.test_mkldnn_pattern_matcher-85c358a1ca92a817.xml (deflated 94%)
2025-12-04T16:18:56.6579448Z   adding: test/test-reports/python-pytest/inductor.test_cpu_select_algorithm/inductor.test_cpu_select_algorithm-99091fae53aceb8e.xml (deflated 28%)
2025-12-04T16:18:56.6583335Z   adding: test/test-reports/python-pytest/test_custom_ops/test_custom_ops-7a9f392fc312693f.xml (deflated 90%)
2025-12-04T16:18:56.6584474Z   adding: test/test-reports/python-pytest/inductor.test_analysis/inductor.test_analysis-ef614f735877f798.xml (deflated 93%)
2025-12-04T16:18:56.6585672Z   adding: test/test-reports/python-pytest/inductor.test_pad_mm/inductor.test_pad_mm-cc450381ece2a8f9.xml (deflated 91%)
2025-12-04T16:18:56.6586906Z   adding: test/test-reports/python-pytest/inductor.test_triton_syntax/inductor.test_triton_syntax-898dc985a45c41c6.xml (deflated 61%)
2025-12-04T16:18:56.6588323Z   adding: test/test-reports/python-pytest/inductor.test_triton_extension_backend/inductor.test_triton_extension_backend-1a18cee9beef4f55.xml (deflated 28%)
2025-12-04T16:18:56.6589758Z   adding: test/test-reports/python-pytest/test_sparse_semi_structured/test_sparse_semi_structured-4f8d9547a4d851ec.xml (deflated 93%)
2025-12-04T16:18:56.6591099Z   adding: test/test-reports/python-pytest/inductor.test_op_completeness/inductor.test_op_completeness-7d3f24a957250fde.xml (deflated 68%)
2025-12-04T16:18:56.6592467Z   adding: test/test-reports/python-pytest/inductor.test_subgraph_choice/inductor.test_subgraph_choice-2437d978fade4f96.xml (deflated 59%)
2025-12-04T16:18:56.6593864Z   adding: test/test-reports/python-pytest/inductor.test_cutedsl_grouped_mm/inductor.test_cutedsl_grouped_mm-9a993ae92ea5ca0a.xml (deflated 95%)
2025-12-04T16:18:56.6595266Z   adding: test/test-reports/python-pytest/inductor.test_cpp_wrapper_hipify/inductor.test_cpp_wrapper_hipify-5078284f3b2f2998.xml (deflated 60%)
2025-12-04T16:18:56.6596642Z   adding: test/test-reports/python-pytest/inductor.test_inductor_utils/inductor.test_inductor_utils-fea0c873b74a6a46.xml (deflated 52%)
2025-12-04T16:18:56.6598138Z   adding: test/test-reports/python-pytest/inductor.test_template_heuristics_registry/inductor.test_template_heuristics_registry-f03db733e7237771.xml (deflated 71%)
2025-12-04T16:18:56.6599617Z   adding: test/test-reports/python-pytest/inductor.test_async_compile/inductor.test_async_compile-26761717acf278af.xml (deflated 86%)
2025-12-04T16:18:56.6601102Z   adding: test/test-reports/python-pytest/dynamo.test_deque_reconstruct/dynamo.test_deque_reconstruct-87f577525bf4c9e0.xml (deflated 68%)
2025-12-04T16:18:56.6602422Z   adding: test/test-reports/python-pytest/inductor.test_utils/inductor.test_utils-906071f9e5aa0510.xml (deflated 64%)
2025-12-04T16:18:56.6603622Z   adding: test/test-reports/python-pytest/inductor.test_indexing/inductor.test_indexing-059deccacca9b28a.xml (deflated 78%)
2025-12-04T16:18:56.6604983Z   adding: test/test-reports/python-pytest/inductor.test_inductor_annotations/inductor.test_inductor_annotations-a710efcfde282e90.xml (deflated 68%)
2025-12-04T16:18:56.6606395Z   adding: test/test-reports/python-pytest/inductor.test_compile_worker/inductor.test_compile_worker-2b558a130ccb3642.xml (deflated 83%)
2025-12-04T16:18:56.6607640Z   adding: test/test-reports/python-pytest/dynamo.test_einops/dynamo.test_einops-c0dc34cc00c52c06.xml (deflated 71%)
2025-12-04T16:18:56.6608929Z   adding: test/test-reports/python-pytest/inductor.test_external_callables/inductor.test_external_callables-00ffeed03000c0d3.xml (deflated 73%)
2025-12-04T16:18:56.6622850Z   adding: test/test-reports/python-pytest/test_testing/test_testing-69992b4cd6aabeac.xml (deflated 96%)
2025-12-04T16:18:56.6624027Z   adding: test/test-reports/python-pytest/dynamo.test_fx_passes_pre_grad/dynamo.test_fx_passes_pre_grad-48a63e950c2eb9b4.xml (deflated 35%)
2025-12-04T16:18:56.6659643Z   adding: test/test-reports/python-pytest/export.test_strict_export_v2/export.test_strict_export_v2-e896fc6c8f5f5413.xml (deflated 95%)
2025-12-04T16:18:56.6661173Z   adding: test/test-reports/python-pytest/export.test_functionalized_assertions/export.test_functionalized_assertions-9948d5e6dd7869dd.xml (deflated 53%)
2025-12-04T16:18:56.6662713Z   adding: test/test-reports/python-pytest/inductor.test_selective_lowering/inductor.test_selective_lowering-3443f84bc8e0d9ea.xml (deflated 67%)
2025-12-04T16:18:56.6664118Z   adding: test/test-reports/python-pytest/dynamo.test_base_output/dynamo.test_base_output-444b9e9b2896f7db.xml (deflated 82%)
2025-12-04T16:18:56.6671149Z   adding: test/test-reports/python-pytest/export.test_serialize/export.test_serialize-c63da72846ec1ca6.xml (deflated 94%)
2025-12-04T16:18:56.6672519Z   adding: test/test-reports/python-pytest/inductor.test_move_constructors_to_gpu/inductor.test_move_constructors_to_gpu-68ab4975dd79b7d5.xml (deflated 81%)
2025-12-04T16:18:56.6673937Z   adding: test/test-reports/python-pytest/inductor.test_remote_cache/inductor.test_remote_cache-3da887a4cab9e620.xml (deflated 59%)
2025-12-04T16:18:56.6675368Z   adding: test/test-reports/python-pytest/inductor.test_coordinate_descent_tuner/inductor.test_coordinate_descent_tuner-6824af132d005f6c.xml (deflated 63%)
2025-12-04T16:18:56.6676830Z   adding: test/test-reports/python-pytest/inductor.test_inplace_padding/inductor.test_inplace_padding-f371eec712e8c5c4.xml (deflated 84%)
2025-12-04T16:18:56.6678177Z   adding: test/test-reports/python-pytest/inductor.test_cudacodecache/inductor.test_cudacodecache-2709b5a1f66ec7aa.xml (deflated 62%)
2025-12-04T16:18:56.6679519Z   adding: test/test-reports/python-pytest/inductor.test_minifier_utils/inductor.test_minifier_utils-8db87fb30c1e8868.xml (deflated 52%)
2025-12-04T16:18:56.6680818Z   adding: test/test-reports/python-pytest/inductor.test_debug_trace/inductor.test_debug_trace-f206ac6f91b833b9.xml (deflated 61%)
2025-12-04T16:18:56.6693978Z   adding: test/test-reports/python-pytest/inductor.test_foreach/inductor.test_foreach-dd7ec36049f8e4a8.xml (deflated 97%)
2025-12-04T16:18:56.6704028Z   adding: test/test-reports/python-pytest/inductor.test_cache/inductor.test_cache-b64adfa949e710fa.xml (deflated 96%)
2025-12-04T16:18:56.6705154Z   adding: test/test-reports/python-pytest/dynamo.test_config/dynamo.test_config-b59ec438e7f139b2.xml (deflated 68%)
2025-12-04T16:18:56.6706377Z   adding: test/test-reports/python-pytest/dynamo.test_metrics_context/dynamo.test_metrics_context-8c54ce911c65a1d8.xml (deflated 76%)
2025-12-04T16:18:56.6707614Z   adding: test/test-reports/python-pytest/export.test_package/export.test_package-ca7d9252e60c0b85.xml (deflated 62%)
2025-12-04T16:18:56.6708731Z   adding: test/test-reports/python-pytest/dynamo.test_nops/dynamo.test_nops-06a6514c719bc621.xml (deflated 62%)
2025-12-04T16:18:56.6710041Z   adding: test/test-reports/python-pytest/inductor.test_graph_transform_observer/inductor.test_graph_transform_observer-7fa27194a995b7de.xml (deflated 37%)
2025-12-04T16:18:56.6711695Z   adding: test/test-reports/python-pytest/export.test_db/export.test_db-656b1fb51498c2a2.xml (deflated 87%)
2025-12-04T16:18:56.6712880Z   adding: test/test-reports/python-pytest/dynamo.test_export_mutations/dynamo.test_export_mutations-ac0f456ff528df13.xml (deflated 77%)
2025-12-04T16:18:56.6714148Z   adding: test/test-reports/python-pytest/inductor.test_config/inductor.test_config-891cd7b3aeb3b5ed.xml (deflated 75%)
2025-12-04T16:18:56.6715392Z   adding: test/test-reports/python-pytest/inductor.test_dependencies/inductor.test_dependencies-0956f606bfbef853.xml (deflated 70%)
2025-12-04T16:18:56.6827298Z   adding: test/test-reports/python-pytest/inductor.test_fuzzer/inductor.test_fuzzer-848012b685a936d2.xml (deflated 88%)
2025-12-04T16:18:56.6828448Z   adding: test/test-reports/python-pytest/dynamo.test_global/dynamo.test_global-3f6b17294db437b1.xml (deflated 81%)
2025-12-04T16:18:56.6848970Z   adding: test/test-reports/python-pytest/inductor.test_control_flow/inductor.test_control_flow-f876791985cb5a1a.xml (deflated 97%)
2025-12-04T16:18:56.6850591Z   adding: test/test-reports/python-pytest/dynamo.test_cudagraphs/dynamo.test_cudagraphs-f8e6c8e1da70ac34.xml (deflated 85%)
2025-12-04T16:18:56.6851903Z   adding: test/test-reports/python-pytest/inductor.test_alignment/inductor.test_alignment-e6a1f3fd35374247.xml (deflated 90%)
2025-12-04T16:18:56.6853164Z   adding: test/test-reports/python-pytest/dynamo.test_profiler/dynamo.test_profiler-4c5fdfc03a5c6f47.xml (deflated 72%)
2025-12-04T16:18:56.6855800Z   adding: test/test-reports/python-pytest/dynamo.test_guard_serialization/dynamo.test_guard_serialization-ad1a0cf4b0a5764d.xml (deflated 86%)
2025-12-04T16:18:56.6861193Z   adding: test/test-reports/python-pytest/dynamo.test_dicts/dynamo.test_dicts-e677e083bbe15d92.xml (deflated 89%)
2025-12-04T16:18:56.6862495Z   adding: test/test-reports/python-pytest/dynamo.test_optimizers/dynamo.test_optimizers-a32616c44840c4cb.xml (deflated 66%)
2025-12-04T16:18:56.6864914Z   adding: test/test-reports/python-pytest/export.test_torchbind/export.test_torchbind-5ef54f6c3fc7e6e3.xml (deflated 92%)
2025-12-04T16:18:56.6866203Z   adding: test/test-reports/python-pytest/dynamo.test_python_dispatcher/dynamo.test_python_dispatcher-323f6251761a8aee.xml (deflated 77%)
2025-12-04T16:18:56.6867422Z   adding: test/test-reports/python-pytest/export.test_swap/export.test_swap-6940316a22c03b83.xml (deflated 93%)
2025-12-04T16:18:56.6868808Z   adding: test/test-reports/python-pytest/export.test_unflatten/export.test_unflatten-ab02733f663f09d1.xml (deflated 92%)
2025-12-04T16:18:56.6870146Z   adding: test/test-reports/python-pytest/dynamo.test_verify_correctness/dynamo.test_verify_correctness-a822576ee13d2405.xml (deflated 64%)
2025-12-04T16:18:56.6873596Z   adding: test/test-reports/python-pytest/inductor.test_fxir_backend/inductor.test_fxir_backend-0ddc410876940750.xml (deflated 91%)
2025-12-04T16:18:56.6875378Z   adding: test/test-reports/python-pytest/dynamo.test_structured_trace/dynamo.test_structured_trace-c4539ed3e1c3f3d2.xml (deflated 87%)
2025-12-04T16:18:56.6876627Z   adding: test/test-reports/python-pytest/dynamo.test_torchrec/dynamo.test_torchrec-a739d4d8dd7fe6db.xml (deflated 28%)
2025-12-04T16:18:56.6877911Z   adding: test/test-reports/python-pytest/test_model_exports_to_core_aten/test_model_exports_to_core_aten-ca8aa6cdcebd4c55.xml (deflated 58%)
2025-12-04T16:18:56.6879296Z   adding: test/test-reports/python-pytest/dynamo.test_precompile_context/dynamo.test_precompile_context-d3b456bb7c9f74bf.xml (deflated 76%)
2025-12-04T16:18:56.6880610Z   adding: test/test-reports/python-pytest/dynamo.test_trace_rules/dynamo.test_trace_rules-cb7e3d7c5a436002.xml (deflated 67%)
2025-12-04T16:18:56.6881792Z   adding: test/test-reports/python-pytest/export.test_upgrader/export.test_upgrader-e574684e7a6f5e02.xml (deflated 69%)
2025-12-04T16:18:56.6882982Z   adding: test/test-reports/python-pytest/dynamo.test_hooks/dynamo.test_hooks-05127548b561fef1.xml (deflated 85%)
2025-12-04T16:18:56.6884128Z   adding: test/test-reports/python-pytest/dynamo.test_generator/dynamo.test_generator-92f221726c5985b1.xml (deflated 92%)
2025-12-04T16:18:56.6885315Z   adding: test/test-reports/python-pytest/export.test_verifier/export.test_verifier-edb630c9e71930f9.xml (deflated 75%)
2025-12-04T16:18:56.6886479Z   adding: test/test-reports/python-pytest/export.test_sparse/export.test_sparse-c54c4a64a1413ccc.xml (deflated 90%)
2025-12-04T16:18:56.6887588Z   adding: test/test-reports/python-pytest/functorch.test_ac/functorch.test_ac-9bf963042854be08.xml (deflated 73%)
2025-12-04T16:18:56.6888689Z   adding: test/test-reports/python-pytest/test_out_dtype_op/test_out_dtype_op-014adb2ecaedb28b.xml (deflated 77%)
2025-12-04T16:18:56.6893213Z   adding: test/test-reports/python-pytest/torch_np.test_ufuncs_basic/torch_np.test_ufuncs_basic-614b306d768a8662.xml (deflated 97%)
2025-12-04T16:18:56.6894452Z   adding: test/test-reports/python-pytest/lazy.test_step_closures/lazy.test_step_closures-4de838954d52331d.xml (deflated 65%)
2025-12-04T16:18:56.6895726Z   adding: test/test-reports/python-pytest/functorch.dim.test_getsetitem/functorch.dim.test_getsetitem-d5e6ac7560412ef9.xml (deflated 85%)
2025-12-04T16:18:56.6924275Z   adding: test/test-reports/python-pytest/test_fx/test_fx-d5755757c0de9fe5.xml (deflated 95%)
2025-12-04T16:18:56.6925239Z   adding: test/test-reports/python-pytest/test_autocast/test_autocast-fd8082499cdeffdb.xml (deflated 82%)
2025-12-04T16:18:56.6926339Z   adding: test/test-reports/python-pytest/test_logging/test_logging-07e1a05cccd3a8b9.xml (deflated 37%)
2025-12-04T16:18:56.6928444Z   adding: test/test-reports/python-pytest/test_python_dispatch/test_python_dispatch-e290291b25b2a739.xml (deflated 86%)
2025-12-04T16:18:56.6929581Z   adding: test/test-reports/python-pytest/nn.test_lazy_modules/nn.test_lazy_modules-90c11bd89c9c9697.xml (deflated 89%)
2025-12-04T16:18:56.6930674Z   adding: test/test-reports/python-pytest/nn.test_pruning/nn.test_pruning-e4f9b7a61d3080de.xml (deflated 87%)
2025-12-04T16:18:56.6931690Z   adding: test/test-reports/python-pytest/test_monitor/test_monitor-821063f2b7915ea1.xml (deflated 68%)
2025-12-04T16:18:56.6932753Z   adding: test/test-reports/python-pytest/test_cuda_sanitizer/test_cuda_sanitizer-32e74fc9c7695511.xml (deflated 86%)
2025-12-04T16:18:56.6933866Z   adding: test/test-reports/python-pytest/test_bundled_inputs/test_bundled_inputs-35f6835618e9721e.xml (deflated 73%)
2025-12-04T16:18:56.6937695Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_numeric/torch_np.numpy_tests.core.test_numeric-1a155fd517c13e25.xml (deflated 93%)
2025-12-04T16:18:56.6955465Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.core.test_multiarray/torch_np.numpy_tests.core.test_multiarray-86fe7342be381be4.xml (deflated 96%)
2025-12-04T16:18:56.6956713Z   adding: test/test-reports/python-pytest/test_itt/test_itt-7f15e1ebb20f1faf.xml (deflated 39%)
2025-12-04T16:18:56.6965434Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_function_base/torch_np.numpy_tests.lib.test_function_base-c71be2950500ec80.xml (deflated 95%)
2025-12-04T16:18:56.6968112Z   adding: test/test-reports/python-pytest/test_masked/test_masked-0947e6a84ac8b531.xml (deflated 96%)
2025-12-04T16:18:56.6970181Z   adding: test/test-reports/python-pytest/test_datapipe/test_datapipe-62d690fc79a0a517.xml (deflated 89%)
2025-12-04T16:18:56.6987176Z   adding: test/test-reports/python-pytest/nn.test_convolution/nn.test_convolution-b018917052e39f95.xml (deflated 97%)
2025-12-04T16:18:56.6990295Z   adding: test/test-reports/python-pytest/test_indexing/test_indexing-f48226185e6ca57a.xml (deflated 91%)
2025-12-04T16:18:56.6992041Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.fft.test_pocketfft/torch_np.numpy_tests.fft.test_pocketfft-bea76ae62a6a548e.xml (deflated 95%)
2025-12-04T16:18:56.6993570Z   adding: test/test-reports/python-pytest/torch_np.numpy_tests.lib.test_shape_base_/torch_np.numpy_tests.lib.test_shape_base_-4cf3761fefa68714.xml (deflated 91%)
2025-12-04T16:18:56.6994941Z   adding: test/test-reports/python-pytest/test_cpp_extensions_jit/test_cpp_extensions_jit-2038af5833d07a07.xml (deflated 83%)
2025-12-04T16:18:56.6996213Z   adding: test/test-reports/python-pytest/profiler.test_python_tracer/profiler.test_python_tracer-4e1c7f97ddacb52a.xml (deflated 64%)
2025-12-04T16:18:56.7000679Z   adding: test/test-reports/python-pytest/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility-c0abede9e59e118f.xml (deflated 96%)
2025-12-04T16:18:56.7005327Z   adding: test/test-reports/python-pytest/distributions.test_distributions/distributions.test_distributions-390f18d46cafc91e.xml (deflated 90%)
2025-12-04T16:18:56.7035486Z ##[group]Run # Remove any previous usage logs if they exist
2025-12-04T16:18:56.7036035Z [36;1m# Remove any previous usage logs if they exist[0m
2025-12-04T16:18:56.7036461Z [36;1mrm -f logs-*.zip[0m
2025-12-04T16:18:56.7036890Z [36;1mzip "logs-${FILE_SUFFIX}.zip" 'usage_log.txt' || true[0m
2025-12-04T16:18:56.7037498Z [36;1mzip -r "logs-${FILE_SUFFIX}.zip" test/test-reports -i '*.log' || true[0m
2025-12-04T16:18:56.7044454Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:56.7044888Z env:
2025-12-04T16:18:56.7045138Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.7045452Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.7045807Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.7046549Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.7047461Z   FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T16:18:56.7048029Z ##[endgroup]
2025-12-04T16:18:56.7120962Z   adding: usage_log.txt (deflated 58%)
2025-12-04T16:18:56.7196560Z   adding: test/test-reports/inductor.test_aot_inductor_4.6_29241cabee62c0de_.log (deflated 92%)
2025-12-04T16:18:56.7197418Z   adding: test/test-reports/test_autocast_1.1_7cd62703ceb14b05_.log (deflated 76%)
2025-12-04T16:18:56.7213446Z   adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_1.5_8dad9aa6fdc82df0_.log (deflated 91%)
2025-12-04T16:18:56.7247960Z   adding: test/test-reports/test_fx_1.1_fe3aedf5a60597eb_.log (deflated 92%)
2025-12-04T16:18:56.7269536Z   adding: test/test-reports/inductor.test_torchinductor_dynamic_shapes_5.5_0c7fd80a5a340f9b_.log (deflated 92%)
2025-12-04T16:18:56.7272271Z   adding: test/test-reports/test_cpp_extensions_jit_1.1_53eadff4adfe6cf3_.log (deflated 88%)
2025-12-04T16:18:56.7277712Z   adding: test/test-reports/inductor.test_kernel_benchmark_1.1_1e5eee0d44ae0f1a_.log (deflated 94%)
2025-12-04T16:18:56.7278640Z   adding: test/test-reports/profiler.test_python_tracer_1.1_2f036554f4a33837_.log (deflated 59%)
2025-12-04T16:18:56.7288098Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_3.17_09d50cf3d15b8ee9_.log (deflated 92%)
2025-12-04T16:18:56.7294277Z   adding: test/test-reports/distributions.test_distributions_1.1_10129d86baeaadf5_.log (deflated 90%)
2025-12-04T16:18:56.7303959Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_8.17_f4805f992a426064_.log (deflated 92%)
2025-12-04T16:18:56.7304842Z   adding: test/test-reports/test_logging_1.1_4a28eee8affd86e2_.log (deflated 49%)
2025-12-04T16:18:56.7311631Z   adding: test/test-reports/inductor.test_torchinductor_opinfo_13.17_50bb27b4d6383988_.log (deflated 91%)
2025-12-04T16:18:56.7317737Z   adding: test/test-reports/inductor.test_pattern_matcher_1.1_3ae84ddebdf6dbd7_.log (deflated 93%)
2025-12-04T16:18:56.7361451Z   adding: test/test-reports/inductor.test_cuda_repro_1.1_4fd57cc505de7852_.log (deflated 96%)
2025-12-04T16:18:56.7371273Z   adding: test/test-reports/inductor.test_cudagraph_trees_1.1_054bcfe63a557371_.log (deflated 88%)
2025-12-04T16:18:56.7411347Z   adding: test/test-reports/inductor.test_cuda_select_algorithm_4.5_53b34f2889361847_.log (deflated 97%)
2025-12-04T16:18:56.7412295Z   adding: test/test-reports/inductor.test_deterministic_1.8_262bcacfdd50a1f9_.log (deflated 65%)
2025-12-04T16:18:56.7413188Z   adding: test/test-reports/inductor.test_deterministic_6.8_b1bfd086dab71470_.log (deflated 58%)
2025-12-04T16:18:56.7414117Z   adding: test/test-reports/inductor.test_extension_backend_1.1_057698d7e9793b3b_.log (deflated 56%)
2025-12-04T16:18:56.7416929Z   adding: test/test-reports/inductor.test_native_matmul_1.2_d47deb602d378eb1_.log (deflated 92%)
2025-12-04T16:18:56.7418168Z   adding: test/test-reports/dynamo.test_fx_graph_runnable_1.1_bc88b60e43fe7f12_.log (deflated 80%)
2025-12-04T16:18:56.7438021Z   adding: test/test-reports/inductor.test_memory_1.1_18f1e5893f70119e_.log (deflated 97%)
2025-12-04T16:18:56.7439198Z   adding: test/test-reports/dynamo.test_streams_1.1_834a989fad2ef2e3_.log (deflated 79%)
2025-12-04T16:18:56.7479726Z   adding: test/test-reports/inductor.test_unbacked_symints_1.1_e6e3a96590269886_.log (deflated 96%)
2025-12-04T16:18:56.7480864Z   adding: test/test-reports/inductor.test_scatter_optimization_1.1_7430a249406bb12a_.log (deflated 78%)
2025-12-04T16:18:56.7565016Z   adding: test/test-reports/inductor.test_mix_order_reduction_1.2_f2061367e8c27b7f_.log (deflated 98%)
2025-12-04T16:18:56.7882628Z   adding: test/test-reports/test_transformers_1.1_cd619bbaee31992c_.log (deflated 98%)
2025-12-04T16:18:56.7904937Z   adding: test/test-reports/test_autograd_1.1_343bbb8e8e4f4e62_.log (deflated 88%)
2025-12-04T16:18:56.7946520Z   adding: test/test-reports/test_sparse_1.2_170c4a4cb63931fe_.log (deflated 94%)
2025-12-04T16:18:56.7962410Z   adding: test/test-reports/test_decomp_2.17_4858d88ccf44ed88_.log (deflated 89%)
2025-12-04T16:18:56.7979692Z   adding: test/test-reports/test_decomp_7.17_ecdc7da48044ddba_.log (deflated 89%)
2025-12-04T16:18:56.7995292Z   adding: test/test-reports/test_decomp_12.17_884069b3bca145fc_.log (deflated 89%)
2025-12-04T16:18:56.8012417Z   adding: test/test-reports/test_decomp_17.17_4ba2ec57e0bb6714_.log (deflated 89%)
2025-12-04T16:18:56.8227247Z   adding: test/test-reports/test_meta_5.5_1a0c05f4e7432569_.log (deflated 93%)
2025-12-04T16:18:56.8242876Z   adding: test/test-reports/test_nestedtensor_1.4_6dff2e85dc80cacf_.log (deflated 91%)
2025-12-04T16:18:56.8257695Z   adding: test/test-reports/test_nestedtensor_4.4_fadd9c2633e00561_.log (deflated 92%)
2025-12-04T16:18:56.8343237Z   adding: test/test-reports/test_ops_5.11_352ce2577683b96d_.log (deflated 91%)
2025-12-04T16:18:56.8427231Z   adding: test/test-reports/test_ops_10.11_9feb13593ea58df6_.log (deflated 91%)
2025-12-04T16:18:56.8467701Z   adding: test/test-reports/functorch.test_ops_2.7_066e83f50e6dcbea_.log (deflated 92%)
2025-12-04T16:18:56.8508081Z   adding: test/test-reports/functorch.test_ops_7.7_c87f7efa94ae13b4_.log (deflated 92%)
2025-12-04T16:18:56.8508948Z   adding: test/test-reports/inductor.test_max_autotune_1.1_dc9c21bc2c4ad5fc_.log (deflated 34%)
2025-12-04T16:18:56.8518080Z   adding: test/test-reports/inductor.test_cpu_repro_3.3_41613d465af9d6d5_.log (deflated 93%)
2025-12-04T16:18:56.8521679Z   adding: test/test-reports/test_python_dispatch_1.1_4a43d809046600b7_.log (deflated 87%)
2025-12-04T16:18:56.8525811Z   adding: test/test-reports/inductor.test_mkldnn_pattern_matcher_2.3_52e8559de495a0be_.log (deflated 92%)
2025-12-04T16:18:56.8526800Z   adding: test/test-reports/inductor.test_cpu_select_algorithm_1.1_2b85f4e0fd3f066c_.log (deflated 49%)
2025-12-04T16:18:56.8534605Z   adding: test/test-reports/test_custom_ops_1.1_37d60717605e8cfe_.log (deflated 89%)
2025-12-04T16:18:56.8535820Z   adding: test/test-reports/inductor.test_analysis_1.1_a128307487ad43a3_.log (deflated 85%)
2025-12-04T16:18:56.8536827Z   adding: test/test-reports/inductor.test_pad_mm_1.1_bfb512e8053e306d_.log (deflated 79%)
2025-12-04T16:18:56.8537686Z   adding: test/test-reports/inductor.test_triton_syntax_1.1_cd6b570d7971cca9_.log (deflated 51%)
2025-12-04T16:18:56.8539222Z   adding: test/test-reports/nn.test_lazy_modules_1.1_641ede76abd1387b_.log (deflated 86%)
2025-12-04T16:18:56.8540140Z   adding: test/test-reports/inductor.test_triton_extension_backend_1.1_e218feea67d6cd2a_.log (deflated 50%)
2025-12-04T16:18:56.8541602Z   adding: test/test-reports/test_sparse_semi_structured_1.1_4dd53f61ed651a5b_.log (deflated 87%)
2025-12-04T16:18:56.8542503Z   adding: test/test-reports/inductor.test_op_completeness_1.1_5deb9907383c3460_.log (deflated 65%)
2025-12-04T16:18:56.8543418Z   adding: test/test-reports/inductor.test_subgraph_choice_1.1_927735b69ebf1973_.log (deflated 55%)
2025-12-04T16:18:56.8544356Z   adding: test/test-reports/inductor.test_cutedsl_grouped_mm_1.1_4f25a6335f622148_.log (deflated 89%)
2025-12-04T16:18:56.8545292Z   adding: test/test-reports/inductor.test_cpp_wrapper_hipify_1.1_353d02c262482f20_.log (deflated 61%)
2025-12-04T16:18:56.8546203Z   adding: test/test-reports/inductor.test_inductor_utils_1.1_67afa62609840b86_.log (deflated 56%)
2025-12-04T16:18:56.8547037Z   adding: test/test-reports/nn.test_pruning_1.1_fc4532e556fbe9d9_.log (deflated 81%)
2025-12-04T16:18:56.8547950Z   adding: test/test-reports/inductor.test_template_heuristics_registry_1.1_3f598775c056439a_.log (deflated 71%)
2025-12-04T16:18:56.8548922Z   adding: test/test-reports/inductor.test_async_compile_1.1_887cb91e60faea2f_.log (deflated 68%)
2025-12-04T16:18:56.8549917Z   adding: test/test-reports/dynamo.test_deque_reconstruct_1.1_f8b7d34594077ea6_.log (deflated 63%)
2025-12-04T16:18:56.8550780Z   adding: test/test-reports/inductor.test_utils_1.1_63e5e2174acc542d_.log (deflated 67%)
2025-12-04T16:18:56.8551667Z   adding: test/test-reports/inductor.test_indexing_1.1_2bd025888cab1cf8_.log (deflated 78%)
2025-12-04T16:18:56.8552587Z   adding: test/test-reports/inductor.test_inductor_annotations_1.1_e129b89bdd73962f_.log (deflated 59%)
2025-12-04T16:18:56.8553615Z   adding: test/test-reports/inductor.test_compile_worker_1.1_00f9da717f84f877_.log (deflated 76%)
2025-12-04T16:18:56.8554475Z   adding: test/test-reports/dynamo.test_einops_1.1_fa1def1006f21bae_.log (deflated 59%)
2025-12-04T16:18:56.8555361Z   adding: test/test-reports/inductor.test_external_callables_1.1_532bdcfa274f54bc_.log (deflated 60%)
2025-12-04T16:18:56.8602253Z   adding: test/test-reports/test_testing_1.1_a28c99e40f247370_.log (deflated 94%)
2025-12-04T16:18:56.8603072Z   adding: test/test-reports/dynamo.test_fx_passes_pre_grad_1.1_7c7f9dd585a9f6c9_.log (deflated 53%)
2025-12-04T16:18:56.8630664Z   adding: test/test-reports/export.test_strict_export_v2_1.1_3c4ed2fe1af04b4b_.log (deflated 92%)
2025-12-04T16:18:56.8631495Z   adding: test/test-reports/test_monitor_1.1_60acff8e80cf96a3_.log (deflated 62%)
2025-12-04T16:18:56.8632382Z   adding: test/test-reports/export.test_functionalized_assertions_1.1_7d17ab73392af6b4_.log (deflated 60%)
2025-12-04T16:18:56.8633362Z   adding: test/test-reports/inductor.test_selective_lowering_1.1_e1c78d2a5185c394_.log (deflated 58%)
2025-12-04T16:18:56.8634267Z   adding: test/test-reports/dynamo.test_base_output_1.1_c6d6552f20e02364_.log (deflated 67%)
2025-12-04T16:18:56.8635132Z   adding: test/test-reports/inductor.test_lookup_table_1.1_47a98ebb9baf620f_.log (deflated 6%)
2025-12-04T16:18:56.8638484Z   adding: test/test-reports/export.test_serialize_1.1_aebb5c7eea9352a2_.log (deflated 88%)
2025-12-04T16:18:56.8640704Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_shape_base__1.1_462d874ba4c079f0_.log (deflated 87%)
2025-12-04T16:18:56.8641706Z   adding: test/test-reports/inductor.test_move_constructors_to_gpu_1.1_3373ad77744fe6e4_.log (deflated 70%)
2025-12-04T16:18:56.8642709Z   adding: test/test-reports/inductor.test_remote_cache_1.1_46ddba7c7bb0dd06_.log (deflated 60%)
2025-12-04T16:18:56.8643568Z   adding: test/test-reports/test_cuda_sanitizer_1.1_06ff5e3bcde71deb_.log (deflated 80%)
2025-12-04T16:18:56.8644493Z   adding: test/test-reports/inductor.test_coordinate_descent_tuner_1.1_ec23ddb0902f120e_.log (deflated 68%)
2025-12-04T16:18:56.8645464Z   adding: test/test-reports/inductor.test_inplace_padding_1.1_79ffe73bfaa271da_.log (deflated 67%)
2025-12-04T16:18:56.8646380Z   adding: test/test-reports/inductor.test_cudacodecache_1.1_0486dc99f2c38224_.log (deflated 56%)
2025-12-04T16:18:56.8647293Z   adding: test/test-reports/inductor.test_minifier_utils_1.1_29e2300addd2b151_.log (deflated 59%)
2025-12-04T16:18:56.8648184Z   adding: test/test-reports/inductor.test_debug_trace_1.1_9dbcd0e5470fca07_.log (deflated 61%)
2025-12-04T16:18:56.8660111Z   adding: test/test-reports/inductor.test_foreach_1.1_72dc555a9d39f8a0_.log (deflated 93%)
2025-12-04T16:18:56.8678775Z   adding: test/test-reports/inductor.test_cache_1.1_b15a3258d122eb10_.log (deflated 95%)
2025-12-04T16:18:56.8679587Z   adding: test/test-reports/dynamo.test_config_1.1_34b955669d56d548_.log (deflated 62%)
2025-12-04T16:18:56.8680427Z   adding: test/test-reports/dynamo.test_metrics_context_1.1_5c0162a494019d34_.log (deflated 72%)
2025-12-04T16:18:56.8681272Z   adding: test/test-reports/export.test_package_1.1_c7910f2956ab0b71_.log (deflated 59%)
2025-12-04T16:18:56.8682111Z   adding: test/test-reports/dynamo.test_nops_1.1_eec8955a89c0749e_.log (deflated 58%)
2025-12-04T16:18:56.8682906Z   adding: test/test-reports/test_bundled_inputs_1.1_395d728a16287961_.log (deflated 73%)
2025-12-04T16:18:56.8683808Z   adding: test/test-reports/inductor.test_graph_transform_observer_1.1_2166094392cbcf10_.log (deflated 54%)
2025-12-04T16:18:56.8685087Z   adding: test/test-reports/export.test_db_1.1_e88cbc04d8a44796_.log (deflated 82%)
2025-12-04T16:18:56.8685908Z   adding: test/test-reports/dynamo.test_export_mutations_1.1_68937c62c4814f0f_.log (deflated 71%)
2025-12-04T16:18:56.8686827Z   adding: test/test-reports/inductor.test_config_1.1_8da77f3c96eb0a54_.log (deflated 74%)
2025-12-04T16:18:56.8687690Z   adding: test/test-reports/inductor.test_dependencies_1.1_a229a828add2b21e_.log (deflated 67%)
2025-12-04T16:18:56.8688639Z   adding: test/test-reports/inductor.test_fuzzer_1.1_7ef41a4207e7fec8_.log (deflated 70%)
2025-12-04T16:18:56.8689444Z   adding: test/test-reports/dynamo.test_global_1.1_be67321ce36fdfe2_.log (deflated 73%)
2025-12-04T16:18:56.9424490Z   adding: test/test-reports/inductor.test_control_flow_1.4_b6ec092c04daf6c8_.log (deflated 97%)
2025-12-04T16:18:56.9425378Z   adding: test/test-reports/dynamo.test_cudagraphs_1.1_f31f593cd6865772_.log (deflated 68%)
2025-12-04T16:18:56.9426226Z   adding: test/test-reports/inductor.test_alignment_1.1_c850ab1c90ef7284_.log (deflated 73%)
2025-12-04T16:18:56.9427068Z   adding: test/test-reports/dynamo.test_profiler_1.1_bdf79e2257b8f437_.log (deflated 72%)
2025-12-04T16:18:56.9429247Z   adding: test/test-reports/dynamo.test_guard_serialization_1.1_ca95c718e2b65acd_.log (deflated 84%)
2025-12-04T16:18:56.9433095Z   adding: test/test-reports/dynamo.test_dicts_1.1_9286d343eb07609f_.log (deflated 87%)
2025-12-04T16:18:56.9433919Z   adding: test/test-reports/dynamo.test_optimizers_1.1_6e8896f6f8ab34bf_.log (deflated 56%)
2025-12-04T16:18:56.9456027Z   adding: test/test-reports/export.test_torchbind_1.1_2a7aef954986f1ed_.log (deflated 96%)
2025-12-04T16:18:56.9456894Z   adding: test/test-reports/dynamo.test_python_dispatcher_1.1_d5e45034fa548233_.log (deflated 69%)
2025-12-04T16:18:56.9457751Z   adding: test/test-reports/export.test_swap_1.1_75b32b5d64f61c05_.log (deflated 78%)
2025-12-04T16:18:56.9459164Z   adding: test/test-reports/export.test_unflatten_1.1_e240ad71aaf7be43_.log (deflated 78%)
2025-12-04T16:18:56.9460069Z   adding: test/test-reports/dynamo.test_verify_correctness_1.1_c32bdac20cc2dbcb_.log (deflated 67%)
2025-12-04T16:18:56.9462836Z   adding: test/test-reports/inductor.test_fxir_backend_1.1_615cfb6d9761ce74_.log (deflated 84%)
2025-12-04T16:18:56.9464923Z   adding: test/test-reports/dynamo.test_structured_trace_1.1_e2032e57f1fbb9a7_.log (deflated 82%)
2025-12-04T16:18:56.9465799Z   adding: test/test-reports/dynamo.test_torchrec_1.1_ef7e4418db36eb14_.log (deflated 49%)
2025-12-04T16:18:56.9466666Z   adding: test/test-reports/test_model_exports_to_core_aten_1.1_1858ccc543938d86_.log (deflated 52%)
2025-12-04T16:18:56.9467573Z   adding: test/test-reports/dynamo.test_precompile_context_1.1_a5d2ca6b4ab870b9_.log (deflated 60%)
2025-12-04T16:18:56.9468469Z   adding: test/test-reports/dynamo.test_trace_rules_1.1_6759ebf57891eeeb_.log (deflated 65%)
2025-12-04T16:18:56.9469303Z   adding: test/test-reports/export.test_upgrader_1.1_ed15a90621ede266_.log (deflated 66%)
2025-12-04T16:18:56.9470110Z   adding: test/test-reports/dynamo.test_hooks_1.1_66426e5cf57243c0_.log (deflated 81%)
2025-12-04T16:18:56.9472278Z   adding: test/test-reports/dynamo.test_generator_1.1_f207b5be74916c07_.log (deflated 86%)
2025-12-04T16:18:56.9473107Z   adding: test/test-reports/export.test_verifier_1.1_96a0b4295b5beb1c_.log (deflated 71%)
2025-12-04T16:18:56.9476104Z   adding: test/test-reports/export.test_sparse_2.2_dc3ae5c04c4515a4_.log (deflated 89%)
2025-12-04T16:18:56.9477243Z   adding: test/test-reports/functorch.test_ac_1.1_99b1ba004ab023a0_.log (deflated 68%)
2025-12-04T16:18:56.9478070Z   adding: test/test-reports/test_out_dtype_op_1.1_3e48e335f34b8277_.log (deflated 72%)
2025-12-04T16:18:56.9486926Z   adding: test/test-reports/torch_np.test_ufuncs_basic_1.1_5b79d2f51b6173f9_.log (deflated 95%)
2025-12-04T16:18:56.9487791Z   adding: test/test-reports/lazy.test_step_closures_1.1_f2cf8fda3341fdfb_.log (deflated 62%)
2025-12-04T16:18:56.9488680Z   adding: test/test-reports/functorch.dim.test_getsetitem_1.1_f956801402f0c75a_.log (deflated 79%)
2025-12-04T16:18:56.9495880Z   adding: test/test-reports/torch_np.numpy_tests.core.test_numeric_1.1_c2ce2dbd13566161_.log (deflated 90%)
2025-12-04T16:18:56.9500436Z   adding: test/test-reports/test_indexing_1.1_2824065dc4dc1509_.log (deflated 90%)
2025-12-04T16:18:56.9523203Z   adding: test/test-reports/torch_np.numpy_tests.core.test_multiarray_1.1_f5a85c7d65f3960a_.log (deflated 93%)
2025-12-04T16:18:56.9524179Z   adding: test/test-reports/test_itt_1.1_0c67806275155360_.log (deflated 49%)
2025-12-04T16:18:56.9526116Z   adding: test/test-reports/torch_np.numpy_tests.fft.test_pocketfft_1.1_5bba81624a9a4669_.log (deflated 90%)
2025-12-04T16:18:56.9539119Z   adding: test/test-reports/torch_np.numpy_tests.lib.test_function_base_1.1_66e1a2bc19dbe7b5_.log (deflated 93%)
2025-12-04T16:18:56.9543941Z   adding: test/test-reports/test_masked_1.1_f4f98418cc401a0c_.log (deflated 92%)
2025-12-04T16:18:56.9544724Z   adding: test/test-reports/optim.test_lrscheduler_1.1_50b469a96bd12a6b_.log (deflated 7%)
2025-12-04T16:18:56.9547068Z   adding: test/test-reports/test_datapipe_1.1_628e5e9adba39130_.log (deflated 85%)
2025-12-04T16:18:56.9564330Z   adding: test/test-reports/nn.test_convolution_1.1_d98f421ddfbea09e_.log (deflated 95%)
2025-12-04T16:18:56.9565880Z   adding: test/test-reports/cpp_extensions.libtorch_agnostic_2_10_extension.test_version_compatibility_1.1_38e9912ded2d6880_.log (deflated 87%)
2025-12-04T16:18:56.9608888Z ##[group]Run # Remove any previous debugging artifacts if they exist
2025-12-04T16:18:56.9609956Z [36;1m# Remove any previous debugging artifacts if they exist[0m
2025-12-04T16:18:56.9610772Z [36;1mrm -f debug-*.zip[0m
2025-12-04T16:18:56.9611301Z [36;1mif [ -d 'test/debug' ]; then[0m
2025-12-04T16:18:56.9611770Z [36;1m  zip -r "debug-${FILE_SUFFIX}.zip" test/debug[0m
2025-12-04T16:18:56.9612163Z [36;1mfi[0m
2025-12-04T16:18:56.9618799Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:18:56.9619240Z env:
2025-12-04T16:18:56.9619485Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.9619794Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.9620161Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.9620800Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.9621614Z   FILE_SUFFIX: test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427
2025-12-04T16:18:56.9622186Z ##[endgroup]
2025-12-04T16:18:56.9712296Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T16:18:56.9712674Z with:
2025-12-04T16:18:56.9712926Z   s3-bucket: gha-artifacts
2025-12-04T16:18:56.9713306Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T16:18:56.9713717Z   retention-days: 14
2025-12-04T16:18:56.9714003Z   if-no-files-found: warn
2025-12-04T16:18:56.9714320Z   path: test-jsons-*.zip
2025-12-04T16:18:56.9714613Z   name: artifact
2025-12-04T16:18:56.9714861Z   region: us-east-1
2025-12-04T16:18:56.9715120Z env:
2025-12-04T16:18:56.9715365Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:56.9715657Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:56.9716023Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:56.9716673Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:56.9717242Z ##[endgroup]
2025-12-04T16:18:57.3691122Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T16:18:57.3691668Z With the provided path, there will be 1 file uploaded
2025-12-04T16:18:57.3692248Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T16:18:57.3747183Z Starting upload of test-jsons-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:57.5813652Z Finished upload of test-jsons-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:57.6044540Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T16:18:57.6044933Z with:
2025-12-04T16:18:57.6045194Z   s3-bucket: gha-artifacts
2025-12-04T16:18:57.6045738Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T16:18:57.6046141Z   retention-days: 14
2025-12-04T16:18:57.6046435Z   if-no-files-found: error
2025-12-04T16:18:57.6046754Z   path: test-reports-*.zip
2025-12-04T16:18:57.6047055Z   name: artifact
2025-12-04T16:18:57.6047379Z   region: us-east-1
2025-12-04T16:18:57.6047641Z env:
2025-12-04T16:18:57.6047882Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:57.6048174Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:57.6048683Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:57.6049333Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:57.6049901Z ##[endgroup]
2025-12-04T16:18:58.0020851Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T16:18:58.0021402Z With the provided path, there will be 1 file uploaded
2025-12-04T16:18:58.0021923Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T16:18:58.0075669Z Starting upload of test-reports-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:58.2456330Z Finished upload of test-reports-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:58.2664068Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T16:18:58.2664460Z with:
2025-12-04T16:18:58.2664714Z   s3-bucket: gha-artifacts
2025-12-04T16:18:58.2665087Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T16:18:58.2665488Z   retention-days: 14
2025-12-04T16:18:58.2665784Z   if-no-files-found: ignore
2025-12-04T16:18:58.2666099Z   path: logs-*.zip
2025-12-04T16:18:58.2666351Z   name: artifact
2025-12-04T16:18:58.2666617Z   region: us-east-1
2025-12-04T16:18:58.2666872Z env:
2025-12-04T16:18:58.2667094Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:58.2667400Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:58.2667771Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:58.2668427Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:58.2669006Z ##[endgroup]
2025-12-04T16:18:58.6362798Z NOTE: s3-prefix specified, ignoring name parameter
2025-12-04T16:18:58.6363361Z With the provided path, there will be 1 file uploaded
2025-12-04T16:18:58.6363924Z Uploading to s3 prefix: pytorch/pytorch/19922826259/1/artifact
2025-12-04T16:18:58.6418113Z Starting upload of logs-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:58.8954813Z Finished upload of logs-test-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu_57119749427.zip
2025-12-04T16:18:58.9162320Z ##[group]Run seemethere/upload-artifact-s3@v5
2025-12-04T16:18:58.9162701Z with:
2025-12-04T16:18:58.9162956Z   s3-bucket: gha-artifacts
2025-12-04T16:18:58.9163329Z   s3-prefix: pytorch/pytorch/19922826259/1/artifact

2025-12-04T16:18:58.9163739Z   retention-days: 14
2025-12-04T16:18:58.9164021Z   if-no-files-found: ignore
2025-12-04T16:18:58.9164331Z   path: debug-*.zip
2025-12-04T16:18:58.9164611Z   name: artifact
2025-12-04T16:18:58.9164857Z   region: us-east-1
2025-12-04T16:18:58.9165111Z env:
2025-12-04T16:18:58.9165347Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:58.9165641Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:58.9166009Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:58.9166664Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:58.9167230Z ##[endgroup]
2025-12-04T16:18:59.2792857Z No files were found with the provided path: debug-*.zip. No artifacts will be uploaded.
2025-12-04T16:18:59.3007666Z ##[group]Run # shellcheck disable=SC2156
2025-12-04T16:18:59.3008120Z [36;1m# shellcheck disable=SC2156[0m
2025-12-04T16:18:59.3008819Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2025-12-04T16:18:59.3015745Z shell: /usr/bin/bash -e {0}
2025-12-04T16:18:59.3016067Z env:
2025-12-04T16:18:59.3016316Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:59.3016747Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:59.3017100Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:59.3017756Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:59.3018415Z ##[endgroup]
2025-12-04T16:18:59.6723415Z ##[group]Run seemethere/upload-artifact-s3@baba72d0712b404f646cebe0730933554ebce96a
2025-12-04T16:18:59.6723999Z with:
2025-12-04T16:18:59.6724421Z   name: coredumps-legacy_nvidia_driver-4-5-linux.g4dn.4xlarge.nvidia.gpu
2025-12-04T16:18:59.6724949Z   retention-days: 14
2025-12-04T16:18:59.6725247Z   if-no-files-found: ignore
2025-12-04T16:18:59.6725548Z   path: ./**/core.[1-9]*
2025-12-04T16:18:59.6725848Z   s3-bucket: gha-artifacts
2025-12-04T16:18:59.6726162Z   region: us-east-1
2025-12-04T16:18:59.6726408Z env:
2025-12-04T16:18:59.6726648Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:18:59.6726955Z   HAS_NVIDIA_GPU: true
2025-12-04T16:18:59.6727308Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:18:59.6727968Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:18:59.6728541Z ##[endgroup]
2025-12-04T16:19:09.0699652Z No files were found with the provided path: ./**/core.[1-9]*. No artifacts will be uploaded.
2025-12-04T16:19:09.1003065Z Prepare all required actions
2025-12-04T16:19:09.1003544Z Getting action download info
2025-12-04T16:19:09.2794618Z Download action repository 'actions/setup-python@v6' (SHA:83679a892e2d95755f2dac6acb0bfd1e9ac5d548)
2025-12-04T16:19:09.6655773Z ##[group]Run ./.github/actions/upload-utilization-stats
2025-12-04T16:19:09.6656213Z with:
2025-12-04T16:19:09.6656464Z   job_id: 57119749427
2025-12-04T16:19:09.6657198Z   job_name: linux-jammy-cuda12.4-py3.10-gcc11 / test (legacy_nvidia_driver, 4, 5, linux.g4dn.4xlarge.nvidia.gpu, mem_leak_check, unstable)
2025-12-04T16:19:09.6657991Z   workflow_name: periodic
2025-12-04T16:19:09.6658310Z   workflow_run_id: 19922826259
2025-12-04T16:19:09.6658633Z   workflow_attempt: 1
2025-12-04T16:19:09.6658907Z env:
2025-12-04T16:19:09.6659148Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:09.6659458Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:09.6659817Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:09.6660528Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:09.6661098Z ##[endgroup]
2025-12-04T16:19:09.6721579Z ##[group]Run actions/setup-python@v6
2025-12-04T16:19:09.6721947Z with:
2025-12-04T16:19:09.6722291Z   python-version: 3.10
2025-12-04T16:19:09.6722602Z   check-latest: false
2025-12-04T16:19:09.6723073Z   token: ***
2025-12-04T16:19:09.6723333Z   update-environment: true
2025-12-04T16:19:09.6723669Z   allow-prereleases: false
2025-12-04T16:19:09.6723994Z   freethreaded: false
2025-12-04T16:19:09.6724263Z env:
2025-12-04T16:19:09.6724508Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:09.6724809Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:09.6725158Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:09.6725838Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:09.6726414Z ##[endgroup]
2025-12-04T16:19:09.8415864Z ##[group]Installed versions
2025-12-04T16:19:09.8426223Z Version 3.10 was not found in the local cache
2025-12-04T16:19:09.8625094Z (node:341813) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
2025-12-04T16:19:09.8626045Z (Use `node --trace-deprecation ...` to show where the warning was created)
2025-12-04T16:19:10.2372067Z ##[error]The version '3.10' with architecture 'x64' was not found for this operating system.
The list of all available versions can be found here: https://raw.githubusercontent.com/actions/python-versions/main/versions-manifest.json
2025-12-04T16:19:10.2540733Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2025-12-04T16:19:10.2541247Z with:
2025-12-04T16:19:10.2541467Z env:
2025-12-04T16:19:10.2541814Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:10.2542130Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:10.2542503Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:10.2543147Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:10.2543798Z ##[endgroup]
2025-12-04T16:19:10.2561606Z ##[group]Run set -eou pipefail
2025-12-04T16:19:10.2562140Z [36;1mset -eou pipefail[0m
2025-12-04T16:19:10.2562455Z [36;1m[0m
2025-12-04T16:19:10.2562878Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2025-12-04T16:19:10.2563426Z [36;1mfor _ in $(seq 1440); do[0m
2025-12-04T16:19:10.2563799Z [36;1m    # Break if no ssh session exists anymore[0m
2025-12-04T16:19:10.2564207Z [36;1m    if [ "$(who)" = "" ]; then[0m
2025-12-04T16:19:10.2564592Z [36;1m      break[0m
2025-12-04T16:19:10.2564862Z [36;1m    fi[0m
2025-12-04T16:19:10.2565119Z [36;1m    echo "."[0m
2025-12-04T16:19:10.2565397Z [36;1m    sleep 5[0m
2025-12-04T16:19:10.2565657Z [36;1mdone[0m
2025-12-04T16:19:10.2572854Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:19:10.2573301Z env:
2025-12-04T16:19:10.2573538Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:10.2573849Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:10.2574219Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:10.2574857Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:10.2575436Z ##[endgroup]
2025-12-04T16:19:10.2608761Z Holding runner for 2 hours until all ssh sessions have logged out
2025-12-04T16:19:10.2693970Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2025-12-04T16:19:10.2694622Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2025-12-04T16:19:10.2695146Z [36;1m# shellcheck disable=SC2046[0m
2025-12-04T16:19:10.2695541Z [36;1mdocker stop $(docker ps -q) || true[0m
2025-12-04T16:19:10.2695949Z [36;1m# Prune all of the docker images[0m
2025-12-04T16:19:10.2696332Z [36;1mdocker system prune -af[0m
2025-12-04T16:19:10.2703036Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:19:10.2703490Z env:
2025-12-04T16:19:10.2703725Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:10.2704048Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:10.2704414Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:10.2705057Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:10.2705640Z ##[endgroup]
2025-12-04T16:19:21.2713665Z 428ca50ff249
2025-12-04T16:19:25.9719388Z Deleted Containers:
2025-12-04T16:19:25.9719896Z 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:25.9720349Z 
2025-12-04T16:19:34.0788758Z Deleted Images:
2025-12-04T16:19:34.0789834Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image:pytorch-linux-jammy-cuda12.4-cudnn9-py3-gcc11-f0cd68561080d537ef3d3d6f81b25a6416ad600a
2025-12-04T16:19:34.0791372Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/ci-image@sha256:ae30f11a5b50741bd652aa0c94ad89ef791c4e50157eff642748620825cf7940
2025-12-04T16:19:34.0792450Z deleted: sha256:5465aa79632b68f6240c23f0d0b021df4d0fd595333b61a40d36a0cf73656024
2025-12-04T16:19:34.0793221Z deleted: sha256:f57a578c46f36a858c2be92210a89558688ee36b619af78c698952c0e3ef05ad
2025-12-04T16:19:34.0793988Z deleted: sha256:ce0698bd1efc811ccead0ecdad944b4839bf17bff387495b58e64cf8db0e210c
2025-12-04T16:19:34.0794763Z deleted: sha256:f0ee66f328fa98c40f336c64fee9a4b42e51a793cceea7f81932068bdc7bd315
2025-12-04T16:19:34.0795513Z deleted: sha256:ea24b30a25c161bd4bd564bfd90c36d88674a1aa59ef3e65647e926c76685be0
2025-12-04T16:19:34.0796278Z deleted: sha256:15bc0847ce5e60cc1a9b36d25283dc5648fb45e04aa9a8dec984af3c193e2f0b
2025-12-04T16:19:34.0798535Z deleted: sha256:3639aa26691090ef45641c75bffcb2e3f427f5e282abc93d607de4433bf90488
2025-12-04T16:19:34.0799348Z deleted: sha256:86258272ba477934c917d08b21e0da6000c268b60f5a9ae907038e7bf3236532
2025-12-04T16:19:34.0800197Z deleted: sha256:ba8e0040c98ddbf87acbc3ae6575b2933c09421ac7094a96e027d1fc9356fbb6
2025-12-04T16:19:34.0801144Z deleted: sha256:ca0176fc0de6cc059c4dbfc313434b5dea2c90dc24f2dc3a1061b941c7b3e6ca
2025-12-04T16:19:34.0801910Z deleted: sha256:cc6a480ab9e6091c6c206bc9b340611b3863258975e835769bd8f2a38b5d8c13
2025-12-04T16:19:34.0802823Z deleted: sha256:8465c24f0b284d8589ea191edeb80d1da07e4a59dfcfdcfa153bdf3d5d678d3e
2025-12-04T16:19:34.0803592Z deleted: sha256:b93bfbd3b55899c606fb98c5edbd21fd63114862a4f5a5b67c7aa63fc9ada9a3
2025-12-04T16:19:34.0804360Z deleted: sha256:6b7582e3ce445d82e9d2ae7769502119c39c1edbf5fe11c195615db8da846931
2025-12-04T16:19:34.0805097Z deleted: sha256:9d79615a9d9ae67110cc9da697933492b385b1e4708d30c2211625bea5d42f27
2025-12-04T16:19:34.0805940Z deleted: sha256:7132c6db5e7d5692786167dfb22dea62d8203dc7837b2d1de435c6e5c85e906e
2025-12-04T16:19:34.0806690Z deleted: sha256:d61bc13a0957d633ff633186c6cbdf48da1c551991d814281262e58709e225a8
2025-12-04T16:19:34.0807561Z deleted: sha256:0c348bbc3988acd329b3e42de4d2c73d5dc4942618716ca312d389d4f704f4bb
2025-12-04T16:19:34.0808302Z deleted: sha256:28d30dd15686ab6819c2f03388c9999bbdaef35e8756817297d795e00dd623fc
2025-12-04T16:19:34.0809056Z deleted: sha256:0a57608df6cffb31a0b24f2537b4dfe7a55bbe6ea02216703cc3172062ab9d75
2025-12-04T16:19:34.0809826Z deleted: sha256:43d23f49f4d70a54b4aff6f4f10d5c5a3d75b100abbbf281ad510177cc80cd99
2025-12-04T16:19:34.0810589Z deleted: sha256:f9e33c2e4c7b8e7179fba052da4d7c4acdc8287f253c95328ae04055755f88a4
2025-12-04T16:19:34.0811342Z deleted: sha256:cfce0930cf33c7136fc92511b9bcad570958363b55e9e0c82e9b8ebc29301356
2025-12-04T16:19:34.0812098Z deleted: sha256:9a709ae20528f500f51271ad2ce6a3d7196fe814a28ae73881901ecef9748c2a
2025-12-04T16:19:34.0812852Z deleted: sha256:68a1d16e9392be6fe939a58c5f941a0919408b5852e52cb04027b0b8777e2b0e
2025-12-04T16:19:34.0813587Z deleted: sha256:042a0022b3eea78f54015f4cf2888bcfa3b91deb0b08830a33c2814b93285dd9
2025-12-04T16:19:34.0814344Z deleted: sha256:a7ba703ff0aa305a608f3b4afd89c2ecd0d1244b127629145a2e691490abb271
2025-12-04T16:19:34.0815119Z deleted: sha256:be44f5fbae55066faba60eebf7065a082abf517ab8f2ebf8ece69e74d45def07
2025-12-04T16:19:34.0815957Z deleted: sha256:a01f1b0d88a8936d648f78787f56579bdb6617edf4620d0410ab6b118351bbb2
2025-12-04T16:19:34.0816902Z deleted: sha256:dc93f45553adafb5c6e7473711c833996f6884dab2da708ffc76b5cf65b8db9d
2025-12-04T16:19:34.0817933Z deleted: sha256:ffdba9ecb5890a9cb23368d781ff5484270b7f13c6d5629feca3512b58b9a0ac
2025-12-04T16:19:34.0818910Z deleted: sha256:268a91c420865628895871795b524436f5cc4403aa53d71f457db21bf42dd530
2025-12-04T16:19:34.0819659Z deleted: sha256:72450bfd97986ccc53d8fa76252130b464fdb3c5fd8e688546e8c3ce0b9d4394
2025-12-04T16:19:34.0820423Z deleted: sha256:63954235d3be0420af6ad2dae2b24849e3eee1edb10cf86d29137c3e19621f47
2025-12-04T16:19:34.0821185Z deleted: sha256:1c4e2d3e68e8a166d1965962077fe194ea00cad2ee636399c0c17ba5a94bdb9c
2025-12-04T16:19:34.0821957Z deleted: sha256:361cacbab7154a0cb62486f57d75b112feedbcc751a7d8f7bb02ec7a61b1fe0d
2025-12-04T16:19:34.0822730Z deleted: sha256:e653f6af92265f4300717bd617aab954cfbf049d4be32e890e57c2e8135be7f9
2025-12-04T16:19:34.0823491Z deleted: sha256:bfffeb2974ffc58c0669724812f701df860257ac3d047a7315a100beb0ea0507
2025-12-04T16:19:34.0824242Z deleted: sha256:6ae48d8efc75420f721058928fe8b1ccf48aa1bdc92de539b1f0db9248a41fcf
2025-12-04T16:19:34.0825006Z deleted: sha256:535c7026785a690366fc69ecbc9a81f1b58a46f63c782620591c1297406a2731
2025-12-04T16:19:34.0825777Z deleted: sha256:8462076c3cc8db6030f38e1137bfbef1aad85404ed4231285c1e06cd414d3e57
2025-12-04T16:19:34.0826539Z deleted: sha256:fe340d63ccb66e5b395b7900c1002a513e4afd7f610e9df5e7262c4f71e93bef
2025-12-04T16:19:34.0827278Z deleted: sha256:b61085386114396fe42144a4aa739b2a0b45f0c30a083462a2ea7b9b675c02aa
2025-12-04T16:19:34.0828237Z deleted: sha256:7772f25c05bcd5ede631d287b826aa108db67c773e377db98ffa73b0917f3629
2025-12-04T16:19:34.0829004Z deleted: sha256:3ea8a43d8193d05ecd6aa473b523a3569e11ae691eed9e6ffd693f23b0106035
2025-12-04T16:19:34.0829802Z deleted: sha256:34647b4087d29cf48a18668bb935a95fc8b2dac3522c2581397f0f27227047fd
2025-12-04T16:19:34.0830608Z deleted: sha256:b6a169f1ab01281c16562ad43b462a1a47a33be8d3cfae0a117ffa5c47d0b532
2025-12-04T16:19:34.0831372Z deleted: sha256:664173a33cd21248a2d73d2eba7887602e36fbc96002d991eb0bd0a2d574ac88
2025-12-04T16:19:34.0832424Z deleted: sha256:d67fdfe94c9a0228f17991cd3e958e36da96d4d597b46773cb7eed98c489f947
2025-12-04T16:19:34.0833461Z deleted: sha256:f2be0722250908742f067756b56ed3fa169daa2f1c8201a7ed4335b2fed2cae5
2025-12-04T16:19:34.0834712Z deleted: sha256:8614db257d8dc9e0f0ee8398a4a4d3c061b2797d6017daaf0696dd7f87633b3e
2025-12-04T16:19:34.0835476Z deleted: sha256:23ee0908a1bf254f1d4dd0591cc0c6801571b4d93950b6fd4fee57ca7e361da0
2025-12-04T16:19:34.0836245Z deleted: sha256:f627a99df4c0f370bd7fc8ea6be7695d8027f988aed52b65233cbcf78b01989b
2025-12-04T16:19:34.0836988Z deleted: sha256:d5e92389b59d4134cdb96113af964186602e98c392e76a8f26d4ea6e54056ccc
2025-12-04T16:19:34.0837751Z deleted: sha256:cbfccf44b9dc670c109634fbf19c2bfff2a3d5243bfa351c851d9fad3f1acfc2
2025-12-04T16:19:34.0838518Z deleted: sha256:1242535e81ad4bd713910a6c5e1b38375b12ed1bcd1b48419813a5ef28a5c84c
2025-12-04T16:19:34.0839271Z deleted: sha256:10b1394079cfe756a1ad9aa9aa3a2995bd5e46ef1e18029eb9eae0398f6d4e88
2025-12-04T16:19:34.0840015Z deleted: sha256:1d32da9a5f10e10c4a97a839151a1943d4db18494e8080bea91a6c9784fde067
2025-12-04T16:19:34.0840770Z deleted: sha256:af2fd59653ebd685a032ef800f8227c0d7b9b0e5ef397b30d4301e001c943e8b
2025-12-04T16:19:34.0841535Z deleted: sha256:c48d351980e3bd24d533ae55d1acc6a27911dffcbb03b2ae552d7ccc3e4cd74f
2025-12-04T16:19:34.0842342Z deleted: sha256:e663afac609b1b6c812ab45265c27d870b92c9fc6849939f0b8635da83cbfb53
2025-12-04T16:19:34.0843094Z deleted: sha256:f79dc17668331d4214ef24000d5c54a0bb2ba70f152d8523f571e2b76a303f4f
2025-12-04T16:19:34.0843853Z deleted: sha256:00de9606a6cd2a2dfb4ceffcb076474d027a1f6273894677090aee7478035865
2025-12-04T16:19:34.0844619Z deleted: sha256:cf35fe1d0317253b75ee17c12783c2561faebf9bf2c59c07ad4712c053246586
2025-12-04T16:19:34.0845358Z deleted: sha256:06622801490739d9db884c23c05a31a1ee86c41e888b34c3ccef23d37f2bdbb5
2025-12-04T16:19:34.0846118Z deleted: sha256:df5dafcaee865ddfb66e22075c63769836e01a627d6fe46658b6f4b4a25318d3
2025-12-04T16:19:34.0846890Z deleted: sha256:7949ae5c4df921feb0e2cd7bac1e402e1ab9135e758fa41cd567880b354b40bc
2025-12-04T16:19:34.0847655Z deleted: sha256:9f19148d820adb1d6e86d0ce68e21fbcedafa7c7ec6c45c9004fa3a607096923
2025-12-04T16:19:34.0848427Z deleted: sha256:1d37d963e85ce22ffaab56a1cf35b3411f34f9432dc5e49ebbdf6f30816cdfa8
2025-12-04T16:19:34.0849198Z deleted: sha256:bac6d91e3830e51e96879deaa3e6d0d39da076fa802ebda68f81bdf7ef8342d5
2025-12-04T16:19:34.0849959Z deleted: sha256:ffd496b07151c90e7ddd68a81a36471f51a544187982db5e34621358e1b29681
2025-12-04T16:19:34.0850711Z deleted: sha256:890b2042bdb9e22a614cea1be88366cd3ae15159bf78ac510b9daa6f802493a6
2025-12-04T16:19:34.0851475Z deleted: sha256:ddd9a57b20a8b45ae0e8e350ec266d50a1b9e9a7ff4921470eb38f004d50eb20
2025-12-04T16:19:34.0852238Z deleted: sha256:2f4f91684b8221bc5cbc3f14c7e00bb693854027a1a6de5ad6bdcd000bb579f2
2025-12-04T16:19:34.0852986Z deleted: sha256:9c01ec5e73233284a0f9bb42de59696a1fa61caacacdf63d04df5ebd73895d77
2025-12-04T16:19:34.0853745Z deleted: sha256:f6153a90f0f5316b03f1464826325a1578231b89b3c1f1c83cc7cebdd41cee2a
2025-12-04T16:19:34.0854494Z deleted: sha256:4e89cd2181813af7fd2219923bae493e33111d8b4ebd76f257b7fb26744fda28
2025-12-04T16:19:34.0855256Z deleted: sha256:a0b77eb4054db8f2ea2ec957b3941b4aeee14b59e94a99a1521f90d6e41faf0e
2025-12-04T16:19:34.0855999Z deleted: sha256:1a1b2848f15aa5114f5a67e3705439512880bf1a7a6436cc67760c59b5f10c46
2025-12-04T16:19:34.0856735Z deleted: sha256:004fc01362840c164664c18580e479546fa0b7f9599487558f80190aec30e2b5
2025-12-04T16:19:34.0857590Z deleted: sha256:35f36e20799f0a0dead81bc3701732e43489264e6bee9fcb789b376a99e17e78
2025-12-04T16:19:34.0858347Z deleted: sha256:1207fd2ede86015c3f105620cb491e8199d2060a4a87490de358286d0ae52e4e
2025-12-04T16:19:34.0859096Z deleted: sha256:02dccb85ee744d1fbb819c6da618b2c52a3e4affc89e407f79b875e7b3bbb7df
2025-12-04T16:19:34.0859917Z deleted: sha256:d22e6ff9c3ac9dabbcc6052e1459f8dc4ebd19bd057bd0688615d6cc3ebb5cf0
2025-12-04T16:19:34.0860685Z deleted: sha256:73974f74b436f39a2fdb6461b1e3f7c3e41c73325776fa71d16b942a5b4a365b
2025-12-04T16:19:34.0861362Z untagged: public.ecr.aws/docker/library/python:3.13
2025-12-04T16:19:34.0862201Z untagged: public.ecr.aws/docker/library/python@sha256:3f986299a7b8b44b0d8cf9bda2b22361ce5c3058ef5d7cb17fb7452506680ab0
2025-12-04T16:19:34.0863192Z deleted: sha256:44438aecfedf7b6086fce506dae0db5ba7fc0027f9b743f1a75a6b5cbc7de70a
2025-12-04T16:19:34.0863965Z deleted: sha256:6f09a1f5d8a107c2532fbd116e75116cb75fa77b1a7d72d3bdf1ac12de152acd
2025-12-04T16:19:34.0864730Z deleted: sha256:fe5f3ac0be086125eb1e3cd10cc33e8e426f4e079381f7ce5a987b626e99fa67
2025-12-04T16:19:34.0865497Z deleted: sha256:79dd2061a22cf919cfc4f1f02704bfda09afadb017265e670ee54441d296c06c
2025-12-04T16:19:34.0866267Z deleted: sha256:9447ad402aafdbee17e999b0ec84ad89c2646dbebf054d469d4f8bee77f66212
2025-12-04T16:19:34.0867012Z deleted: sha256:7a4909f3c1975be52292f53107495ee1b41c17494918767ccedf1cf1688ae318
2025-12-04T16:19:34.0867753Z deleted: sha256:3474923d97f1f498237650a7d51bd4aea37d5e6b9d8a778777920584af5dd560
2025-12-04T16:19:34.0868501Z deleted: sha256:683afd1773444401a9cbd24842ee5d9154a11abb4fab63ddea5c03df788597ee
2025-12-04T16:19:34.0868951Z 
2025-12-04T16:19:34.0869087Z Total reclaimed space: 35.13GB
2025-12-04T16:19:34.0905293Z ##[group]Run set +e
2025-12-04T16:19:34.0905693Z [36;1mset +e[0m
2025-12-04T16:19:34.0905948Z [36;1mset -x[0m
2025-12-04T16:19:34.0906203Z [36;1m[0m
2025-12-04T16:19:34.0906447Z [36;1mnvidia-smi[0m
2025-12-04T16:19:34.0906983Z [36;1m# NB: Surprisingly, nvidia-smi command returns successfully with return code 0 even in[0m
2025-12-04T16:19:34.0907804Z [36;1m# the case where the driver has already crashed as it still can get the driver version[0m
2025-12-04T16:19:34.0908603Z [36;1m# and some basic information like the bus ID.  However, the rest of the information[0m
2025-12-04T16:19:34.0909224Z [36;1m# would be missing (ERR!), for example:[0m
2025-12-04T16:19:34.0909592Z [36;1m#[0m
2025-12-04T16:19:34.0909952Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T16:19:34.0910588Z [36;1m# | NVIDIA-SMI 525.89.02    Driver Version: 525.89.02    CUDA Version: 12.0     |[0m
2025-12-04T16:19:34.0911255Z [36;1m# |-------------------------------+----------------------+----------------------+[0m
2025-12-04T16:19:34.0911888Z [36;1m# | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |[0m
2025-12-04T16:19:34.0912579Z [36;1m# | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |[0m
2025-12-04T16:19:34.0913143Z [36;1m# |                               |                      |               MIG M. |[0m
2025-12-04T16:19:34.0913563Z [36;1m# |===============================+======================+======================|[0m
2025-12-04T16:19:34.0914053Z [36;1m# |   0  ERR!                Off  | 00000000:00:1E.0 Off |                 ERR! |[0m
2025-12-04T16:19:34.0914620Z [36;1m# |ERR!  ERR! ERR!    ERR! / ERR! |   4184MiB / 23028MiB |    ERR!      Default |[0m
2025-12-04T16:19:34.0915141Z [36;1m# |                               |                      |                 ERR! |[0m
2025-12-04T16:19:34.0915633Z [36;1m# +-------------------------------+----------------------+----------------------+[0m
2025-12-04T16:19:34.0916084Z [36;1m#[0m
2025-12-04T16:19:34.0916437Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T16:19:34.0916988Z [36;1m# | Processes:                                                                  |[0m
2025-12-04T16:19:34.0917539Z [36;1m# |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |[0m
2025-12-04T16:19:34.0918071Z [36;1m# |        ID   ID                                                   Usage      |[0m
2025-12-04T16:19:34.0918605Z [36;1m# |=============================================================================|[0m
2025-12-04T16:19:34.0919104Z [36;1m# +-----------------------------------------------------------------------------+[0m
2025-12-04T16:19:34.0919537Z [36;1m#[0m
2025-12-04T16:19:34.0920040Z [36;1m# This should be reported as a failure instead as it will guarantee to fail when[0m
2025-12-04T16:19:34.0920641Z [36;1m# Docker tries to run with --gpus all[0m
2025-12-04T16:19:34.0921015Z [36;1m#[0m
2025-12-04T16:19:34.0921428Z [36;1m# So, the correct check here is to query one of the missing piece of info like[0m
2025-12-04T16:19:34.0922130Z [36;1m# GPU name, so that the command can fail accordingly[0m
2025-12-04T16:19:34.0922715Z [36;1mnvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0[0m
2025-12-04T16:19:34.0923203Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T16:19:34.0923517Z [36;1m[0m
2025-12-04T16:19:34.0924029Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T16:19:34.0924807Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T16:19:34.0925493Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T16:19:34.0926100Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T16:19:34.0926488Z [36;1mfi[0m
2025-12-04T16:19:34.0926720Z [36;1m[0m
2025-12-04T16:19:34.0927293Z [36;1m# For runner with multiple GPUs, we also want to confirm that the number of GPUs are the[0m
2025-12-04T16:19:34.0928039Z [36;1m# power of 2, i.e. 1, 2, 4, or 8. This is to avoid flaky test issue when one GPU fails[0m
2025-12-04T16:19:34.0928662Z [36;1m# https://github.com/pytorch/test-infra/issues/4000[0m
2025-12-04T16:19:34.0929161Z [36;1mGPU_COUNT=$(nvidia-smi --list-gpus | wc -l)[0m
2025-12-04T16:19:34.0929579Z [36;1mNVIDIA_SMI_STATUS=$?[0m
2025-12-04T16:19:34.0929886Z [36;1m[0m
2025-12-04T16:19:34.0930379Z [36;1m# These are acceptable return code from nvidia-smi as copied from setup-nvidia GitHub action[0m
2025-12-04T16:19:34.0931151Z [36;1mif [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then[0m
2025-12-04T16:19:34.0931844Z [36;1m  echo "NVIDIA driver installation has failed, shutting down the runner..."[0m
2025-12-04T16:19:34.0932444Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T16:19:34.0932813Z [36;1mfi[0m
2025-12-04T16:19:34.0933058Z [36;1m[0m
2025-12-04T16:19:34.0933346Z [36;1m# Check the GPU count to be a power of 2[0m
2025-12-04T16:19:34.0934004Z [36;1mif [ "$GPU_COUNT" -le 8 ] && [ "$GPU_COUNT" -ne 1 ] && [ "$GPU_COUNT" -ne 2 ] && [ "$GPU_COUNT" -ne 4 ] && [ "$GPU_COUNT" -ne 8 ]; then[0m
2025-12-04T16:19:34.0934885Z [36;1m  echo "NVIDIA driver detects $GPU_COUNT GPUs. The runner has a broken GPU, shutting it down..."[0m
2025-12-04T16:19:34.0935560Z [36;1m  .github/scripts/stop_runner_service.sh[0m
2025-12-04T16:19:34.0935943Z [36;1mfi[0m
2025-12-04T16:19:34.0944872Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:19:34.0945329Z env:
2025-12-04T16:19:34.0945581Z   GIT_DEFAULT_BRANCH: main
2025-12-04T16:19:34.0945876Z   HAS_NVIDIA_GPU: true
2025-12-04T16:19:34.0946242Z   GPU_FLAG: --gpus all -e NVIDIA_DRIVER_CAPABILITIES=all
2025-12-04T16:19:34.0946902Z   DOCKER_CONTAINER_ID: 428ca50ff249576164b3a85d863c1596ac9bd68475a0cd57f4914df88099c3d4
2025-12-04T16:19:34.0947482Z ##[endgroup]
2025-12-04T16:19:34.0979436Z + nvidia-smi
2025-12-04T16:19:34.1179799Z Thu Dec  4 16:19:34 2025       
2025-12-04T16:19:34.1180241Z +-----------------------------------------------------------------------------+
2025-12-04T16:19:34.1180853Z | NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
2025-12-04T16:19:34.1181433Z |-------------------------------+----------------------+----------------------+
2025-12-04T16:19:34.1182044Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2025-12-04T16:19:34.1182694Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2025-12-04T16:19:34.1183327Z |                               |                      |               MIG M. |
2025-12-04T16:19:34.1183734Z |===============================+======================+======================|
2025-12-04T16:19:34.1343049Z |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2025-12-04T16:19:34.1343584Z | N/A   25C    P8    16W /  70W |      2MiB / 15360MiB |      0%      Default |
2025-12-04T16:19:34.1344039Z |                               |                      |                  N/A |
2025-12-04T16:19:34.1344508Z +-------------------------------+----------------------+----------------------+
2025-12-04T16:19:34.1344978Z                                                                                
2025-12-04T16:19:34.1345443Z +-----------------------------------------------------------------------------+
2025-12-04T16:19:34.1345941Z | Processes:                                                                  |
2025-12-04T16:19:34.1346470Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2025-12-04T16:19:34.1346959Z |        ID   ID                                                   Usage      |
2025-12-04T16:19:34.1347367Z |=============================================================================|
2025-12-04T16:19:34.1348847Z |  No running processes found                                                 |
2025-12-04T16:19:34.1349429Z +-----------------------------------------------------------------------------+
2025-12-04T16:19:34.2169433Z + nvidia-smi --query-gpu=gpu_name --format=csv,noheader --id=0
2025-12-04T16:19:34.2346645Z Tesla T4
2025-12-04T16:19:34.2385087Z + NVIDIA_SMI_STATUS=0
2025-12-04T16:19:34.2385409Z + '[' 0 -ne 0 ']'
2025-12-04T16:19:34.2391857Z ++ nvidia-smi --list-gpus
2025-12-04T16:19:34.2392517Z ++ wc -l
2025-12-04T16:19:34.2588979Z + GPU_COUNT=1
2025-12-04T16:19:34.2589256Z + NVIDIA_SMI_STATUS=0
2025-12-04T16:19:34.2589545Z + '[' 0 -ne 0 ']'
2025-12-04T16:19:34.2589833Z + '[' 1 -le 8 ']'
2025-12-04T16:19:34.2590082Z + '[' 1 -ne 1 ']'
2025-12-04T16:19:34.2684243Z Post job cleanup.
2025-12-04T16:19:34.2771834Z Post job cleanup.
2025-12-04T16:19:34.2823392Z Post job cleanup.
2025-12-04T16:19:34.3953872Z [command]/usr/bin/git version
2025-12-04T16:19:34.4016742Z git version 2.50.1
2025-12-04T16:19:34.4055669Z Copying '/home/ec2-user/.gitconfig' to '/home/ec2-user/actions-runner/_work/_temp/9fc44b29-008f-4bd6-acfc-6f30b31731c8/.gitconfig'
2025-12-04T16:19:34.4065179Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/9fc44b29-008f-4bd6-acfc-6f30b31731c8' before making global git config changes
2025-12-04T16:19:34.4066416Z Adding repository directory to the temporary git global config as a safe directory
2025-12-04T16:19:34.4070800Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2025-12-04T16:19:34.4113438Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2025-12-04T16:19:34.4154680Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2025-12-04T16:19:34.4497899Z Entering 'android/libs/fbjni'
2025-12-04T16:19:34.4561943Z Entering 'third_party/FP16'
2025-12-04T16:19:34.4624861Z Entering 'third_party/FXdiv'
2025-12-04T16:19:34.4690950Z Entering 'third_party/NNPACK'
2025-12-04T16:19:34.4754083Z Entering 'third_party/NVTX'
2025-12-04T16:19:34.4819978Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T16:19:34.4883175Z Entering 'third_party/XNNPACK'
2025-12-04T16:19:34.4965716Z Entering 'third_party/aiter'
2025-12-04T16:19:34.5029369Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T16:19:34.5101776Z Entering 'third_party/benchmark'
2025-12-04T16:19:34.5164752Z Entering 'third_party/composable_kernel'
2025-12-04T16:19:34.5241075Z Entering 'third_party/cpp-httplib'
2025-12-04T16:19:34.5304489Z Entering 'third_party/cpuinfo'
2025-12-04T16:19:34.5367206Z Entering 'third_party/cudnn_frontend'
2025-12-04T16:19:34.5430714Z Entering 'third_party/cutlass'
2025-12-04T16:19:34.5505795Z Entering 'third_party/fbgemm'
2025-12-04T16:19:34.5570170Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T16:19:34.5632437Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T16:19:34.5702926Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T16:19:34.5765200Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T16:19:34.5838244Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T16:19:34.5901665Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T16:19:34.5962739Z Entering 'third_party/fbgemm/external/json'
2025-12-04T16:19:34.6027722Z Entering 'third_party/flash-attention'
2025-12-04T16:19:34.6091978Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T16:19:34.6160319Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T16:19:34.6233393Z Entering 'third_party/flatbuffers'
2025-12-04T16:19:34.6299488Z Entering 'third_party/fmt'
2025-12-04T16:19:34.6363441Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T16:19:34.6428365Z Entering 'third_party/gloo'
2025-12-04T16:19:34.6492023Z Entering 'third_party/googletest'
2025-12-04T16:19:34.6555633Z Entering 'third_party/ideep'
2025-12-04T16:19:34.6617088Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T16:19:34.6687290Z Entering 'third_party/ittapi'
2025-12-04T16:19:34.6750059Z Entering 'third_party/kineto'
2025-12-04T16:19:34.6814718Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T16:19:34.6876258Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T16:19:34.6940213Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T16:19:34.7001519Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T16:19:34.7063521Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T16:19:34.7126249Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T16:19:34.7191400Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T16:19:34.7254918Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T16:19:34.7318054Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T16:19:34.7381188Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T16:19:34.7445401Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T16:19:34.7507899Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:34.7571313Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:34.7638514Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T16:19:34.7700564Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T16:19:34.7765947Z Entering 'third_party/kleidiai'
2025-12-04T16:19:34.7831379Z Entering 'third_party/mimalloc'
2025-12-04T16:19:34.7894741Z Entering 'third_party/nlohmann'
2025-12-04T16:19:34.7963067Z Entering 'third_party/onnx'
2025-12-04T16:19:34.8046708Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T16:19:34.8111064Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T16:19:34.8180457Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T16:19:34.8244391Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T16:19:34.8307804Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T16:19:34.8370261Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T16:19:34.8434468Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T16:19:34.8496036Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T16:19:34.8557206Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T16:19:34.8618587Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:34.8682417Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:34.8745905Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T16:19:34.8833110Z Entering 'third_party/pocketfft'
2025-12-04T16:19:34.8896167Z Entering 'third_party/protobuf'
2025-12-04T16:19:34.8962246Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T16:19:34.9023301Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T16:19:34.9086683Z Entering 'third_party/psimd'
2025-12-04T16:19:34.9149739Z Entering 'third_party/pthreadpool'
2025-12-04T16:19:34.9211742Z Entering 'third_party/pybind11'
2025-12-04T16:19:34.9275296Z Entering 'third_party/python-peachpy'
2025-12-04T16:19:34.9337588Z Entering 'third_party/sleef'
2025-12-04T16:19:34.9401725Z Entering 'third_party/tensorpipe'
2025-12-04T16:19:34.9464636Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T16:19:34.9527263Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T16:19:34.9589974Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T16:19:34.9670423Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T16:19:34.9730749Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T16:19:34.9818291Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2025-12-04T16:19:34.9842729Z http.https://github.com/.extraheader
2025-12-04T16:19:34.9854432Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2025-12-04T16:19:34.9890826Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2025-12-04T16:19:35.0226313Z Entering 'android/libs/fbjni'
2025-12-04T16:19:35.0269804Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0308596Z Entering 'third_party/FP16'
2025-12-04T16:19:35.0351134Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0389014Z Entering 'third_party/FXdiv'
2025-12-04T16:19:35.0432136Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0470060Z Entering 'third_party/NNPACK'
2025-12-04T16:19:35.0515695Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0555654Z Entering 'third_party/NVTX'
2025-12-04T16:19:35.0598666Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0638748Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T16:19:35.0681554Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0719010Z Entering 'third_party/XNNPACK'
2025-12-04T16:19:35.0762249Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0817527Z Entering 'third_party/aiter'
2025-12-04T16:19:35.0860448Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0902970Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T16:19:35.0944964Z http.https://github.com/.extraheader
2025-12-04T16:19:35.0994031Z Entering 'third_party/benchmark'
2025-12-04T16:19:35.1036751Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1074544Z Entering 'third_party/composable_kernel'
2025-12-04T16:19:35.1117446Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1166569Z Entering 'third_party/cpp-httplib'
2025-12-04T16:19:35.1210028Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1248342Z Entering 'third_party/cpuinfo'
2025-12-04T16:19:35.1291685Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1333529Z Entering 'third_party/cudnn_frontend'
2025-12-04T16:19:35.1376906Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1415745Z Entering 'third_party/cutlass'
2025-12-04T16:19:35.1457795Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1507458Z Entering 'third_party/fbgemm'
2025-12-04T16:19:35.1550891Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1592093Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T16:19:35.1633964Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1671314Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T16:19:35.1713897Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1760280Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T16:19:35.1805717Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1843714Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T16:19:35.1885334Z http.https://github.com/.extraheader
2025-12-04T16:19:35.1933890Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T16:19:35.1976128Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2013859Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T16:19:35.2055204Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2098493Z Entering 'third_party/fbgemm/external/json'
2025-12-04T16:19:35.2141215Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2181210Z Entering 'third_party/flash-attention'
2025-12-04T16:19:35.2224214Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2262606Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T16:19:35.2304987Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2349728Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T16:19:35.2391924Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2440157Z Entering 'third_party/flatbuffers'
2025-12-04T16:19:35.2482351Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2523726Z Entering 'third_party/fmt'
2025-12-04T16:19:35.2566263Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2605635Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T16:19:35.2648174Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2687639Z Entering 'third_party/gloo'
2025-12-04T16:19:35.2729798Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2767856Z Entering 'third_party/googletest'
2025-12-04T16:19:35.2811087Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2849851Z Entering 'third_party/ideep'
2025-12-04T16:19:35.2891777Z http.https://github.com/.extraheader
2025-12-04T16:19:35.2928658Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T16:19:35.2969073Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3016064Z Entering 'third_party/ittapi'
2025-12-04T16:19:35.3058995Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3096645Z Entering 'third_party/kineto'
2025-12-04T16:19:35.3139079Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3177298Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T16:19:35.3219420Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3257134Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T16:19:35.3299972Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3340357Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T16:19:35.3383639Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3422799Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T16:19:35.3464024Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3502350Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T16:19:35.3543420Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3579792Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T16:19:35.3622603Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3662002Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T16:19:35.3704633Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3741903Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T16:19:35.3785073Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3823840Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T16:19:35.3866069Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3905574Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T16:19:35.3946880Z http.https://github.com/.extraheader
2025-12-04T16:19:35.3984248Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T16:19:35.4028347Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4064824Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:35.4110405Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4151070Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:35.4193851Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4236405Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T16:19:35.4278749Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4316190Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T16:19:35.4357473Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4397279Z Entering 'third_party/kleidiai'
2025-12-04T16:19:35.4440470Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4478825Z Entering 'third_party/mimalloc'
2025-12-04T16:19:35.4522339Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4559670Z Entering 'third_party/nlohmann'
2025-12-04T16:19:35.4603102Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4642265Z Entering 'third_party/onnx'
2025-12-04T16:19:35.4684601Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4741927Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T16:19:35.4784378Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4825854Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T16:19:35.4868200Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4908600Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T16:19:35.4949084Z http.https://github.com/.extraheader
2025-12-04T16:19:35.4985872Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T16:19:35.5027608Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5066647Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T16:19:35.5108847Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5145924Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T16:19:35.5188062Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5227980Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T16:19:35.5269336Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5307061Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T16:19:35.5348185Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5385134Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T16:19:35.5426618Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5462086Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:35.5506414Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5544947Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:35.5586607Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5626904Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T16:19:35.5668822Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5729554Z Entering 'third_party/pocketfft'
2025-12-04T16:19:35.5773987Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5812202Z Entering 'third_party/protobuf'
2025-12-04T16:19:35.5855497Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5896106Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T16:19:35.5938326Z http.https://github.com/.extraheader
2025-12-04T16:19:35.5974980Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T16:19:35.6017629Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6056856Z Entering 'third_party/psimd'
2025-12-04T16:19:35.6099759Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6138281Z Entering 'third_party/pthreadpool'
2025-12-04T16:19:35.6180697Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6218159Z Entering 'third_party/pybind11'
2025-12-04T16:19:35.6261012Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6298833Z Entering 'third_party/python-peachpy'
2025-12-04T16:19:35.6341580Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6379730Z Entering 'third_party/sleef'
2025-12-04T16:19:35.6422220Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6459823Z Entering 'third_party/tensorpipe'
2025-12-04T16:19:35.6503409Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6540719Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T16:19:35.6582058Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6619693Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T16:19:35.6660681Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6697634Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T16:19:35.6738668Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6775529Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T16:19:35.6818794Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6854906Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T16:19:35.6896867Z http.https://github.com/.extraheader
2025-12-04T16:19:35.6957322Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:35.6991224Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url
2025-12-04T16:19:35.7335179Z Entering 'android/libs/fbjni'
2025-12-04T16:19:35.7363662Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2025-12-04T16:19:35.7382318Z Entering 'third_party/FP16'
2025-12-04T16:19:35.7411791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2025-12-04T16:19:35.7429981Z Entering 'third_party/FXdiv'
2025-12-04T16:19:35.7458375Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2025-12-04T16:19:35.7476804Z Entering 'third_party/NNPACK'
2025-12-04T16:19:35.7506238Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2025-12-04T16:19:35.7524604Z Entering 'third_party/NVTX'
2025-12-04T16:19:35.7553856Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config	remote.origin.url
2025-12-04T16:19:35.7572927Z Entering 'third_party/VulkanMemoryAllocator'
2025-12-04T16:19:35.7600828Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2025-12-04T16:19:35.7619450Z Entering 'third_party/XNNPACK'
2025-12-04T16:19:35.7648270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2025-12-04T16:19:35.7684507Z Entering 'third_party/aiter'
2025-12-04T16:19:35.7713595Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config	remote.origin.url
2025-12-04T16:19:35.7733168Z Entering 'third_party/aiter/3rdparty/composable_kernel'
2025-12-04T16:19:35.7760925Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config	remote.origin.url
2025-12-04T16:19:35.7787661Z Entering 'third_party/benchmark'
2025-12-04T16:19:35.7816714Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T16:19:35.7834913Z Entering 'third_party/composable_kernel'
2025-12-04T16:19:35.7863989Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config	remote.origin.url
2025-12-04T16:19:35.7890735Z Entering 'third_party/cpp-httplib'
2025-12-04T16:19:35.7919167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config	remote.origin.url
2025-12-04T16:19:35.7937338Z Entering 'third_party/cpuinfo'
2025-12-04T16:19:35.7966012Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2025-12-04T16:19:35.7985594Z Entering 'third_party/cudnn_frontend'
2025-12-04T16:19:35.8014386Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2025-12-04T16:19:35.8033429Z Entering 'third_party/cutlass'
2025-12-04T16:19:35.8062170Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2025-12-04T16:19:35.8089635Z Entering 'third_party/fbgemm'
2025-12-04T16:19:35.8118136Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2025-12-04T16:19:35.8138893Z Entering 'third_party/fbgemm/external/asmjit'
2025-12-04T16:19:35.8166766Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config	remote.origin.url
2025-12-04T16:19:35.8184238Z Entering 'third_party/fbgemm/external/composable_kernel'
2025-12-04T16:19:35.8211947Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config	remote.origin.url
2025-12-04T16:19:35.8239068Z Entering 'third_party/fbgemm/external/cpuinfo'
2025-12-04T16:19:35.8267187Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config	remote.origin.url
2025-12-04T16:19:35.8285854Z Entering 'third_party/fbgemm/external/cutlass'
2025-12-04T16:19:35.8313432Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config	remote.origin.url
2025-12-04T16:19:35.8341944Z Entering 'third_party/fbgemm/external/googletest'
2025-12-04T16:19:35.8369257Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config	remote.origin.url
2025-12-04T16:19:35.8386654Z Entering 'third_party/fbgemm/external/hipify_torch'
2025-12-04T16:19:35.8415108Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config	remote.origin.url
2025-12-04T16:19:35.8432101Z Entering 'third_party/fbgemm/external/json'
2025-12-04T16:19:35.8459370Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config	remote.origin.url
2025-12-04T16:19:35.8480121Z Entering 'third_party/flash-attention'
2025-12-04T16:19:35.8509110Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config	remote.origin.url
2025-12-04T16:19:35.8527785Z Entering 'third_party/flash-attention/csrc/composable_kernel'
2025-12-04T16:19:35.8556178Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config	remote.origin.url
2025-12-04T16:19:35.8580463Z Entering 'third_party/flash-attention/csrc/cutlass'
2025-12-04T16:19:35.8608270Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config	remote.origin.url
2025-12-04T16:19:35.8636116Z Entering 'third_party/flatbuffers'
2025-12-04T16:19:35.8665421Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2025-12-04T16:19:35.8686831Z Entering 'third_party/fmt'
2025-12-04T16:19:35.8715038Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2025-12-04T16:19:35.8733341Z Entering 'third_party/gemmlowp/gemmlowp'
2025-12-04T16:19:35.8761791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2025-12-04T16:19:35.8780741Z Entering 'third_party/gloo'
2025-12-04T16:19:35.8808983Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2025-12-04T16:19:35.8827956Z Entering 'third_party/googletest'
2025-12-04T16:19:35.8857079Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:35.8875725Z Entering 'third_party/ideep'
2025-12-04T16:19:35.8903967Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2025-12-04T16:19:35.8920709Z Entering 'third_party/ideep/mkl-dnn'
2025-12-04T16:19:35.8948285Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2025-12-04T16:19:35.8974137Z Entering 'third_party/ittapi'
2025-12-04T16:19:35.9005119Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2025-12-04T16:19:35.9023105Z Entering 'third_party/kineto'
2025-12-04T16:19:35.9052297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2025-12-04T16:19:35.9070366Z Entering 'third_party/kineto/libkineto/third_party/dynolog'
2025-12-04T16:19:35.9098253Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config	remote.origin.url
2025-12-04T16:19:35.9116442Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/DCGM'
2025-12-04T16:19:35.9144229Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config	remote.origin.url
2025-12-04T16:19:35.9163710Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/cpr'
2025-12-04T16:19:35.9192505Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config	remote.origin.url
2025-12-04T16:19:35.9211437Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/fmt'
2025-12-04T16:19:35.9239540Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config	remote.origin.url
2025-12-04T16:19:35.9258540Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags'
2025-12-04T16:19:35.9286645Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config	remote.origin.url
2025-12-04T16:19:35.9303508Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/gflags/doc'
2025-12-04T16:19:35.9331578Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config	remote.origin.url
2025-12-04T16:19:35.9350848Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/glog'
2025-12-04T16:19:35.9378674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config	remote.origin.url
2025-12-04T16:19:35.9396380Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/googletest'
2025-12-04T16:19:35.9433071Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:35.9450816Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/json'
2025-12-04T16:19:35.9479550Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config	remote.origin.url
2025-12-04T16:19:35.9498602Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/pfs'
2025-12-04T16:19:35.9528044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config	remote.origin.url
2025-12-04T16:19:35.9545477Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp'
2025-12-04T16:19:35.9573528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T16:19:35.9591489Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:35.9620212Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T16:19:35.9641020Z Entering 'third_party/kineto/libkineto/third_party/dynolog/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:35.9670567Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T16:19:35.9693213Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2025-12-04T16:19:35.9721763Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2025-12-04T16:19:35.9740078Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2025-12-04T16:19:35.9769015Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:35.9789842Z Entering 'third_party/kleidiai'
2025-12-04T16:19:35.9820524Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config	remote.origin.url
2025-12-04T16:19:35.9839859Z Entering 'third_party/mimalloc'
2025-12-04T16:19:35.9868544Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config	remote.origin.url
2025-12-04T16:19:35.9887562Z Entering 'third_party/nlohmann'
2025-12-04T16:19:35.9916983Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2025-12-04T16:19:35.9936893Z Entering 'third_party/onnx'
2025-12-04T16:19:35.9966639Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2025-12-04T16:19:36.0005868Z Entering 'third_party/onnx/third_party/pybind11'
2025-12-04T16:19:36.0034462Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T16:19:36.0055463Z Entering 'third_party/opentelemetry-cpp'
2025-12-04T16:19:36.0085174Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config	remote.origin.url
2025-12-04T16:19:36.0106434Z Entering 'third_party/opentelemetry-cpp/third_party/benchmark'
2025-12-04T16:19:36.0134202Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T16:19:36.0151452Z Entering 'third_party/opentelemetry-cpp/third_party/googletest'
2025-12-04T16:19:36.0180498Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:36.0198099Z Entering 'third_party/opentelemetry-cpp/third_party/ms-gsl'
2025-12-04T16:19:36.0226853Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config	remote.origin.url
2025-12-04T16:19:36.0244161Z Entering 'third_party/opentelemetry-cpp/third_party/nlohmann-json'
2025-12-04T16:19:36.0272701Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config	remote.origin.url
2025-12-04T16:19:36.0291600Z Entering 'third_party/opentelemetry-cpp/third_party/opentelemetry-proto'
2025-12-04T16:19:36.0319514Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config	remote.origin.url
2025-12-04T16:19:36.0336738Z Entering 'third_party/opentelemetry-cpp/third_party/opentracing-cpp'
2025-12-04T16:19:36.0364564Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config	remote.origin.url
2025-12-04T16:19:36.0381887Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp'
2025-12-04T16:19:36.0411176Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config	remote.origin.url
2025-12-04T16:19:36.0427382Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/civetweb'
2025-12-04T16:19:36.0455984Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config	remote.origin.url
2025-12-04T16:19:36.0475493Z Entering 'third_party/opentelemetry-cpp/third_party/prometheus-cpp/3rdparty/googletest'
2025-12-04T16:19:36.0504874Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config	remote.origin.url
2025-12-04T16:19:36.0524069Z Entering 'third_party/opentelemetry-cpp/tools/vcpkg'
2025-12-04T16:19:36.0551669Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config	remote.origin.url
2025-12-04T16:19:36.0592845Z Entering 'third_party/pocketfft'
2025-12-04T16:19:36.0622142Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2025-12-04T16:19:36.0639641Z Entering 'third_party/protobuf'
2025-12-04T16:19:36.0668044Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2025-12-04T16:19:36.0691389Z Entering 'third_party/protobuf/third_party/benchmark'
2025-12-04T16:19:36.0719012Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2025-12-04T16:19:36.0737059Z Entering 'third_party/protobuf/third_party/googletest'
2025-12-04T16:19:36.0765528Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:36.0785660Z Entering 'third_party/psimd'
2025-12-04T16:19:36.0814839Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2025-12-04T16:19:36.0833140Z Entering 'third_party/pthreadpool'
2025-12-04T16:19:36.0861793Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2025-12-04T16:19:36.0880385Z Entering 'third_party/pybind11'
2025-12-04T16:19:36.0909381Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T16:19:36.0927907Z Entering 'third_party/python-peachpy'
2025-12-04T16:19:36.0957348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2025-12-04T16:19:36.0975695Z Entering 'third_party/sleef'
2025-12-04T16:19:36.1004735Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2025-12-04T16:19:36.1023258Z Entering 'third_party/tensorpipe'
2025-12-04T16:19:36.1051796Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2025-12-04T16:19:36.1069566Z Entering 'third_party/tensorpipe/third_party/googletest'
2025-12-04T16:19:36.1097162Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2025-12-04T16:19:36.1115599Z Entering 'third_party/tensorpipe/third_party/libnop'
2025-12-04T16:19:36.1143241Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2025-12-04T16:19:36.1160756Z Entering 'third_party/tensorpipe/third_party/libuv'
2025-12-04T16:19:36.1188056Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2025-12-04T16:19:36.1208474Z Entering 'third_party/tensorpipe/third_party/pybind11'
2025-12-04T16:19:36.1238286Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2025-12-04T16:19:36.1255061Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2025-12-04T16:19:36.1283683Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2025-12-04T16:19:36.1325992Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1358078Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1386395Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1416326Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1445556Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NVTX/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1473753Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1504317Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1531613Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1562198Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/aiter/modules/3rdparty/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1589956Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1619043Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1647867Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpp-httplib/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1676268Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1704454Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1733973Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1761517Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1789727Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/asmjit/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1818349Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1846632Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cpuinfo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1874917Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1904323Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1932630Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/hipify_torch/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1961031Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/external/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.1988179Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2016102Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/composable_kernel/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2043616Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flash-attention/modules/csrc/cutlass/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2070826Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2098606Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2128089Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2155140Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2182769Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2210414Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2238670Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2266767Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2296347Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2325808Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2354151Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/DCGM/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2382983Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/cpr/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2412954Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2442010Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2470674Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/gflags/modules/doc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2498421Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/glog/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2528342Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2556809Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2586285Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/pfs/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2635453Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2650338Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2679013Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/dynolog/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2709130Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2737208Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2764979Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kleidiai/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2792430Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/mimalloc/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2820590Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2847725Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2875497Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2903605Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2931360Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2959616Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.2986911Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/ms-gsl/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3015907Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/nlohmann-json/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3043984Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentelemetry-proto/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3071575Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/opentracing-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3099810Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3128583Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/civetweb/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3157236Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/third_party/prometheus-cpp/modules/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3183854Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/opentelemetry-cpp/modules/tools/vcpkg/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3211136Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3245177Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3273503Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3300597Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3328476Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3356344Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3384599Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3414424Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3443045Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3471001Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3498540Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3527013Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3556978Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3585725Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3614070Z [command]/usr/bin/git config --file /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config --name-only --get-regexp ^includeIf\.gitdir:
2025-12-04T16:19:36.3726048Z A job completed hook has been configured by the self-hosted runner administrator
2025-12-04T16:19:36.3742578Z ##[group]Run '/home/ec2-user/runner-scripts/after_job.sh'
2025-12-04T16:19:36.3748694Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2025-12-04T16:19:36.3749148Z ##[endgroup]
2025-12-04T16:19:36.3847728Z [!ALERT!] Swap in detected! [!ALERT!]
2025-12-04T16:19:48.4526127Z [!ALERT!] Swap out detected [!ALERT!]
2025-12-04T16:20:08.6707070Z Cleaning up orphan processes